<a id='title'></a>
# <bold><font color=gray>Jupyter App:</font><font color=darkpink> A Student's Analysis of his Domestic Thermal Footprint</font></bold>
## <bold><font color=white>=================</font><font color=purple>Or, how the Raspberry Pi saved me $5,000.</font>

## <font size=5 color=darkpink>Abstract:</font>

Concerned with utility costs during summer peak months, ideas on how to reduce spending came up. Before making any big ticket commitments, we started looking at ways to improve utility efficiencies.  We lacked current information about the "domestic thermal qualities" or HVAC and insulation efficiencies, so we set out to study the impact of the environment on our home's a/c usage as well as assess the a/c cooling abilities. The challenge was to make reliable assessments within a budget. The middle game was to evaluate the a/c cooling unit's effectiveness. The end game was to determine if the overall goal could be met economically, by making small changes internally to promote cooling effeciency. <font color=black>**Methods:**</font> Raspberry Pi microcomputers were fitted with sensors to measure temperature and humidity and then stream the data to cloud. The sensors were a collection of DHT22 and DHT11 sensors, each collecting temperature and humidity. A weather polling service using python was scripted to collect the weather information in 4 parts of the city, including the area where the study was conducted. The sensors collected data each minute, while the weather poll collected every 10 minutes. <font>**Results:**</font> A week's worth of data was consumed using Panda. We found the weather fluctuates from one part of the city to another, and the humidity was negatively correlated to the temperature. Concerning indoors, the environmental elements in each room varied significantly and the humidity was weakly negatively correlated to the temperature. The hottest room we determined to be filled with electronics and had a window facing the sun - of which only minimal window covering was present. The coldest room was found to stay generally 2 degrees cooler than the remaining rooms. The ventilation system was overcooling 1 room, in order to manage the temperature in the hottest room. On a positive note, the range of humidity values collected internally showed the a/c is effective in removing humidity. As the temperature in the majority of rooms stayed at within a consistent range near the HVAC thermostat temperature setting, we see the a/c's ability to cool is acceptable.**The correlation between weather and internal environment for temp was 4.6% and the humidity was even less at 3%.** This confirms the effectiveness of the a/c cooling unit. <font>**Conclusions:**</font> A big ticket HVAC expenditure is not needed at this time. By restricting the amount of direct sunlight on the hottest room, engaging a dehumidifier near the thermostat, and restricting the ventilation going to the coldest room, we are hoping to balance the rates at which each room cools and reduce the amount of time the a/c is engaged. 


    
## <font size=5 color=gray>Notebook Purpose: </font><font color=darkblue>Process and Present Data using Pandas </font><font color=darkblue> including:</font>
<font color=white size=1>==========================================================================================================================================================================</font>
<font size=2  color=gray>> using </font><font color=darkblue>**JSON, Lists, Dictionaries, Loops, Dictionary Writer, Z-scores with SciPy and Visualizing with MatPlotLib**</font>
<font color=white size=1>==========================================================================================================================================================================</font>
<font size=2  color=gray> via ugly colors and links.</font>
<a href ='#top'>Jump to Table of Contents</a>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a> 

### <a id='top'></a>
<font color=black size=1>=============================================================================================================</font>
### <font size=5 color=darkpink>Table of Contents</font>
<font color=black size=1>=============================================================================================================</font>

<a href='#methods'><font color=darkpink>**Methods**</font></a>

<a href='#section1'><font color=darkblue>**Section1:     Process Atlanta weather data (External Data)**</font></a>

<a href='#sec1pt1'>Part 1 - Read in raw JSON, write out CSV spreadsheet</a>

<a href='#sec1pt2'>Part 2 - Review Weather MetaData</a>

<a href='#sec1pt3'>Part 3 - Data Edits - Format DATES</a>

<a href='#sec1pt4'>Part 4 - Breakout DataFrames by Location</a>

<a href='#sec1pt5'>Part 5 - Visualize Weather Data for Completeness</a>

<a href='#section2'><font color=darkblue>**Section2:     Process Home climate (Internal Data)**</font></a>

<a href='#sec2pt1'>Part 1 - Read, write and remove dups from Environment Data CSV spreadsheet</a>

<a href='#sec2pt2'>Part 2 - Review Environment Data current state</a>

<a href='#sec2pt3'>Part 3 - Renaming Columns and Replacing Data</a>

<a href='#sec2pt4'>Part 4 - Visualize and Assess Environment Data</a>

<a href='#results'><font color=darkpink>**Results**</font></a>

<a href='#sec3pt1'>Part 1 - Present Weather findings</a>

<a href='#sec3pt2'>Part 2 - Present Environment findings</a>

<a href='#sec3pt3'>Part 3 - Present Environment Correlations findings</a>

<a href='#conclusions'><font color=darkpink>**Conclusions**</font></a>

<a href='#section4'><font color=darkblue>**Section1:    Comparative Analysis**</font></a>

<a href='#sec4pt2'><font color=darkblue>**Section2:     Conclusions**</font></a>

<a href='#sec4pt3'><font color=darkblue>**Section3:     Limitations and Future Plans**</font></a>

<a id='methods'></a>
<font color=black size=1>=============================================================================================================</font>


<font color=darkpink size=10>Methods</font>

<font color=black size=1>=============================================================================================================</font>

<a href ='#top'>Jump to Table of Contents</a>

<font color=black>This project is part of an larger, ongoing IOT project. In particular, where the sensors could be monitoring a warehouse to assist in environmental controls. 
    
**Initial development**

The home was chosen a location for proof of concept, with zero cost and complete control over location. The idea was to complete P.O.C. and then build to incorporate additional robust sophistication when implementing in larger or more complex locations where the environment needs managing. Sensors more accurate than the ones we chose were considered cost prohibitive and overly complex for a P.O.C using ARM microcomputers (the Raspberry PI). Eventually this proof of concept functionality will be absorbed by the larger, ongoing IOT project.


<font size=10 color=white></font>
**Locations**

As mentioned, testing in a home had the highest utility, in terms of cost and usage. At the time of this project, acquiring access to a warehouse remained pending and considered out of reach. Other IOT projects that tied into the larger IOT project were tested and demonstrated in an office complex presentation room. This too, was considered to be out of reach. Concerning the weather, 4 parts of Atlanta were tracked, The north side of the perimeter, Marietta, Stone Mountain and downtown at the Coca-Cola Olympic Village. The rational was taking a radius between the home and the downtown Capitol and extending east and west.Latitude and Longitude were captured as well as weather information, for later use in visualizing the location (if desired), allowing for exact determination of where the weather was being extracted. 

**Measures**

The sensors were not calibrated. Upon initial activation of the sensor, the data captured was compared to a true thermostat. Any sensor reporting temperature measurements beyond 5 celsius were discarded and replaced. We found a high correspondence between the accuracy of reported temp and the accuracy of reported humidity. Restating, we found when a sensor reported an unacceptable temperature, we found the humidity was also highly disperate. Take note, the DHT22 sensor, which is more accurate than the DHT11 sensor, requires a 10K resistor to keep the microcomputer port current from impacting the reported signal - the reported temp and humidity. The additional cost was less than .01 (actual cost is 2.00 - as buying resistors in bulk is cheaper than purchasing individually).


**Hardware Specifics** compliments of https://www.mouser.com/ds/2/737/dht-932870.pdf and https://www.raspberrypi.org/products/raspberry-pi-3-model-b-plus/

**DHT-11**

3 to 5V power

O2.5mA max current(while requesting data)

humidity readings with 5% accuracy

temperature readings ±2°C accuracy

No more than 1 Hz sampling rate (once every second)

Body size 15.5mm x 12mm x 5.5mm

**DHT-22**

3 to 5V power

O2.5mA max current(while requesting data)

humidity readings with ±2% accuracy

temperature readings ±0.5°C accuracy

No more than 1 Hz sampling rate (once every second)

Body size 15.5mm x 12mm x 5.5mm

<a id='section1'></a>
# <font color=darkpink>Section 1: Process Atlanta weather data (External data)</font>
<a href ='#top'>Jump to Table of Contents</a>

## <font color=darkblue>Part 1 - Read in raw JSON, write out CSV spreadsheet</font>
<font color=gray>step 1 - Edit check for valid JSON format.</font>
<a id='sec1pt1'></a>

<a href ='#top'>Jump to Table of Contents</a>

In [None]:
import csv as csv
import json
from pandas.io.json import json_normalize
import matplotlib
import matplotlib.pyplot as plt
import os
import pandas as pd
import seaborn as sns
from scipy import stats
from matplotlib import style
#------------------------------------------------#
# Set processing parameters and directives
#------------------------------------------------#
matplotlib.use('nbagg')
style.use('ggplot')
%matplotlib inline
plt.rcParams['figure.figsize'] = (16,12)
plt.rcParams['font.size'] = 8
#------------------------------------------------#
# Set the appropriate path
#------------------------------------------------#
home_path = 'C:\\users\\bucbo_000\\Desktop'

if os.path.isdir(home_path):
    os.chdir(home_path)
else:
    home_path='./'


#------------------------------------------------#
#Open files for JSON read and write LIST
#------------------------------------------------#
data_file = open('weather_collect.txt', "r", encoding = 'utf-8')     

#-------------------------------------------------#
# Convert json to list of dictionaries, then parse accordingly
# IF JSON is INVALID - the json.loads will error. We
# trap the error and simply move onto the next record.
#-------------------------------------------------#

f1 = data_file.readlines()
my_df = list()
for x in f1:
    try:
        data = json.loads(x) 
        df = data
        my_df.append(df)
    except:
        print("skipping: JSON format invalid for:", x)

<font color=gray>step 2 - Write valid data to list of dictionaries for consumption by dictionary writer, a child class of cvs module.</font>

<a href ='#top'>Jump to Table of Contents</a>

In [None]:
#------------------------------------------------#
# Define output file and header line
#------------------------------------------------#
with open('weather.csv', 'w', newline='') as csvfile:
    fieldnames = ['sysdate', 'loc', 'temp', 'hum','forecast', 'lat', 'lon']
# This opens the `DictWriter`.
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
#
# Write out the header row (this only needs to be done once!).
    writer.writeheader()
#------------------------------------------------#
# Read in, Write out and Loop 
#------------------------------------------------#
    for a_df in my_df:
        try:
            writer.writerow(a_df)
        except:
            print("Unable to parse, skipping record...")

<font color=gray>step 3 - Build DataFrame from list of dictionaries.</font>

<a href ='#top'>Jump to Table of Contents</a>

In [None]:
ext_df = pd.read_csv(home_path + "\\weather.csv", usecols=range(0,5)) 

<a id='sec1pt2'></a>
## <font color=darkblue>Part 2 - Review Weather MetaData.</font>

<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Show DataFrame sample</font>

    
<font size=5 color=blue>Table 1a. displays a random sampling of the weather data collected</font>

In [None]:
ext_df.sample(n=10)

<font color=gray>step 2 - Review summary of weather data collected.<font>
    
<font size=5 color=blue>Table 1b. summarizes the mean temperature by location and forecast</font>

In [None]:
a = ext_df.groupby(['loc', 'forecast'])['temp'].mean().unstack().dropna(axis=1)
a

<font color=gray>step 3 - Review content metadata stats.</font>

<a href ='#top'>Jump to Table of Contents</a>

<font size=5 color=blue>Table 1c. displays the weather ***metadata*** statistics</font>

In [None]:
print("------------------------------------")
print("Weather Data has ", ext_df.shape[0], "Rows and", ext_df.shape[1], "Columns of types:")
print("------------------------------------")
print(ext_df.dtypes)
print(" ")
print("------------------------------------")
print(f"Weather Data nulls search:")
print("------------------------------------")
print(ext_df.isnull().sum())
print(" ")
print()
print("------------------------------------")
print(f"Weather Data counts (raw)")
print("------------------------------------")
for i in ext_df.columns:
    print("The count for", i, "is", ext_df[i].count())
print(" ")
print("------------------------------------")
print(f"Weather Data counts excluding nulls:")
print("------------------------------------")
print(ext_df.dropna().count())
print(" ")
print("------------------------------------")
print(f"Weather Data statistics:")
print("------------------------------------")
ext_df.describe([0])

<font size=1 color=black>===============================================================================================================</font>
## <font color=darkpink>Clean the weather data.</font>
<font size=1 color=black>===============================================================================================================</font>

### <font color=red>Discrepancy!</font><font color=darkgray> The </font><font color=darkorange>weather</font><font color=darkgray> humidity is in </font><font color=darkorange> percentages</font><font color=darkgray> while the </font><font color=blue> indoors </font><font color=darkgray> humidity is in </font><font color=blue>integers</font>

<font color=gray>step 4 - Convert the humidity before moving on.</font>

In [None]:
ext_df['hum'] = (ext_df['hum'] * 100)


<font color=gray>step 5 - Drop null weather data.</font>

In [None]:
ext_df.dropna(axis=0, inplace=True)

<font color=gray>step 6 - Take sample and verify results.</font>

In [None]:
ext_df.sample(n=5)

<a id='sec1pt3'></a>
## <font color=darkblue>Part 3 - Format DATES and build indicies.</font>

<a href ='#top'>Jump to Table of Contents</a>


<font color=gray>step1 - Format the sysdate column to be a pandas DATE object</font>

In [None]:
ext_df['sysdate'] = pd.to_datetime(ext_df['sysdate'])

<font color=gray>step 2 - Validate the DATE conversion worked.</font>

<a href ='#top'>Jump to Table of Contents</a>

In [None]:
ext_df.dtypes

<font color=gray>step 3 - Build index for visualization preparation</font>

<a href ='#top'>Jump to Table of Contents</a>

In [None]:
ext_df.set_index(['sysdate'], inplace=True)

<a id='sec1pt4'></a>
## <font color=darkblue>Part 4 - Breakout DataFrames by Location.</font>

<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Create new dataframes by location in city.</font>

In [None]:
ext_atl = ext_df[ext_df['loc'] == 'Atlanta Georgia']
ext_marietta = ext_df[ext_df['loc'] == 'Big Chicken Marietta Georgia']
ext_stonemtn = ext_df[ext_df['loc'] == 'Stone Mountain Park']
ext_coke = ext_df[ext_df['loc'] == 'Coca-Cola Olympic Park']

<font color=gray>step 2 - Take sub-group sample stat to ensure dataframes built correctly.</font>

In [None]:
ext_atl['forecast'].value_counts()

<a id='sec1pt5'></a>
## <font color=darkblue>Part 5 - Visualize External Data for Completeness.</font>

<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Breakout external data by groups</font>

In [None]:
ext_df.drop(columns = ['forecast']).groupby(['loc']).count()


<font color=gray>step 3 - Merge external data groups</font>

In [None]:
ow = ext_atl.join(ext_marietta, how='left', lsuffix='_atl', rsuffix='_marietta')
sw = ext_coke.join(ext_stonemtn, how='left', lsuffix='_coke', rsuffix='_stnmtn')
outside = ow.join(sw, how='left')
outside.sort_index(inplace=True)

<font color=gray>step 4 - Do basic plot as a test</font>

### <font color = blue size=5>Figure 1a - Initial display of weather temp and humidity</font>

 <font color=purple size=4> This actually discloses a lot, but we avoid details, as we stay on track of presentation.</font>

In [None]:
outside.plot(figsize=(22,11))


### <font color = blue size=5>Figure 1b - Same data as figure 1b, but smoothed with rolling mean</font>

In [None]:
ext_df['avg_temp'] = ext_atl['temp'].rolling(10).mean().dropna()
ext_df['avg_hum'] = ext_atl['hum'].rolling(10).mean().dropna()
ax = ext_df['avg_hum'].plot()
ax1 = ext_df['avg_temp'].plot(ax = ax, figsize=(22,11))

<font color=gray>step 5 - Present external temperatures.</font>

### <font color = blue size=5>Figure 1c - Initial display of weather temp</font>

In [None]:
outside.drop(columns=['hum_atl', 'hum_coke','hum_stnmtn', 'hum_marietta']).plot(figsize=(22,11))

<font color=gray>step 6 - Present external humidities.</font>

### <font color = blue size=5>Figure 1d - Initial display of weather humidity</font>

In [None]:
outside.drop(columns=['temp_atl', 'temp_coke','temp_stnmtn', 'temp_marietta']).plot(figsize=(22,11))

<a id='section2'></a>
# <font color=darkpink>Section 2: Process Home climate (Internal Data)</font>
<a href ='#top'>Jump to Table of Contents</a>


### <a id='sec2pt1'></a>
## <font color=darkblue>Part 1 - Read, write and remove dups from environment data csv spreadsheet</font>


<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Edit and Build step combined - See Section 1 for step by step details</font>


In [None]:
import csv as csv
import datetime
import json
from pandas.io.json import json_normalize
import matplotlib
import matplotlib.pyplot as plt
import os
import pandas as pd
import seaborn as sns

# Set the appropriate path

#------------------------------------------------#
# Set processing parameters and directives
#------------------------------------------------#
matplotlib.use('nbagg')
%matplotlib inline
plt.rcParams['figure.figsize'] = (16,12)
plt.rcParams['font.size'] = 8
#------------------------------------------------#
# Set the appropriate path
#------------------------------------------------#


home_path = 'C:\\users\\bucbo_000\\Desktop'

if os.path.isdir(home_path):
    os.chdir(home_path)
else:
    home_path='./'
    
#Open files for read and write - write headings first
data_file = open('mqtt.txt', "r", encoding = 'utf-8')     

# Convert json to list of dictionaries, then parse accordingly

f1 = data_file.readlines()
my_df = list()
for x in iter(f1):
    try:
        data = json.loads(x)
        df = data #json_normalize(data)
        my_df.append(df)
    except:
        print("skipping: JSON format invalid for:", x)
        
cnt_accept = 0
cnt_reject =0
content_df = list()
for i in enumerate(my_df):
     for k,v in enumerate(i):
        #print(i)
        #print("v=", v)
        #if str(v).startswith('{\'device') and "loc" in str(v):
        if "env" not in str(i):
            content_df.append(my_df)
            cnt_accept +=1
        else:
            print("skipping: JSON content invalid")
            cnt_reject +=1
            
print("We now have", cnt_accept, "records to use after cleaning",cnt_reject, "rows")

cnt_accept = 0
cnt_reject = 0

# Define output file and header line
with open('mqtt.csv', 'w', newline='') as csvfile:
    fieldnames = ['device','sysdate', 'temp', 'hum', 'state']
# This opens the `DictWriter`.
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
#
# Write out the header row (this only needs to be done once!).
    writer.writeheader()
#
# Read in, Write out and Loop 
#
    for a_df in my_df:
        try:
            writer.writerow(a_df)
            cnt_accept += 1
        except:
            print("Unable to parse, skipping...")
            cnt_reject += 1

print("We now have", cnt_accept, "Dictionary records to use after cleaning",cnt_reject, "rows")


int_df = pd.read_csv(home_path + "\\mqtt.csv", usecols=range(0,4)) 

#######################################################
#
int_df = int_df.drop_duplicates()
#
#######################################################

int_df['sysdate'] = pd.to_datetime(int_df['sysdate'])

<a id='sec2pt2'></a>
## <font color=darkblue>Part 2 - Review Environment Metadata</font>

<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Review and assess.</font>

In [None]:
print("------------------------------------")
print("Environment Data has ", int_df.shape[0], "Rows and", int_df.shape[1], "Columns of types:")
print("------------------------------------")
print(int_df.dtypes)
print(" ")
print("------------------------------------")
print(f"Environment Data nulls search:")
print("------------------------------------")
print(int_df.isnull().sum())
print(" ")
print("------------------------------------")
print(f"Environment Data counts (raw)")
print("------------------------------------")
for i in int_df.columns:
    print("The count for", i, "is", int_df[i].count())
print("------------------------------------")
print(f"Environment Data counts:")
print("------------------------------------")
print(int_df.dropna().count())
print(" ")
print("------------------------------------")
print(f"Environment Data statistics:")
print("------------------------------------")
int_df.describe()
print(f"device cardinality:", int_df['device'].value_counts())
print(f"temp mean:", int_df['temp'].value_counts().mean())
print(f"humidity mean:", int_df['hum'].value_counts().mean())

<font color=gray>step 2a - Review temp for outliers</font>

### <font color = blue size=5>Table 2a - Initial display of environmental temp</font>

In [None]:
int_df['temp'].plot()
new_df = int_df.copy()

<font color=gray>step 2b - Clean up outliers</font>

## <font color=blue>Notice</font><font color=purple> the wide swing in </font><font color=red>temp </font><font color=purple> ranges. Let's use Z-Score and clean up the temp outliers.

In [None]:
print(abs(stats.zscore(int_df['temp'])).min())
print(abs(stats.zscore(int_df['temp'])).mean())
print(abs(stats.zscore(int_df['temp'])).std())
print(abs(stats.zscore(int_df['temp'])).max())
int_df = int_df[abs(stats.zscore(int_df['temp'])) < 2.0]

<font color=gray>step 2c - Review new data</font>

In [None]:
print("------------------------------------")
print("Environment Data statistics:")
print("------------------------------------")
int_df.describe()

<font color=gray>step 2d - Clean up outliers</font>

## <font color=blue>Notice</font><font color=purple> the wide swing in </font><font color=red>hum </font><font color=purple> ranges. Let's use Z-Score and clean up the hum outliers.

In [None]:
print(abs(stats.zscore(int_df['hum'])).min())
print(abs(stats.zscore(int_df['hum'])).mean())
print(abs(stats.zscore(int_df['hum'])).std())
print(abs(stats.zscore(int_df['hum'])).max())
int_df = int_df[abs(stats.zscore(int_df['hum'])) < 2.0 ]

<font color=gray>step 2e - Review new data</font>

In [None]:
print("------------------------------------")
print("Environment Data statistics:")
print("------------------------------------")
int_df.describe()

<a id='sec2pt3'></a>
## <font color=darkblue>Part 3 - Rename columns, replace data and build index.</font>

<a href ='#top'>Jump to Table of Contents</a>


<font color=gray>step 1 - Change devices to match room location</font>

In [None]:
int_df.replace({'device': {'B8:27:EB:76:5F:45': 'Living Room', 'B8:27:EB:2D:40:28': 'Master Bedroom', 'B8:27:EB:37:B0:F8': 'Guest Bedroom', 'B8:27:EB:A9:D4:C2': 'Kitchen'}}, inplace=True)

<font color=gray>step 2 - COLUMN renaming step to make columns meaningful"</font>

In [None]:
#int_df.rename(columns={'state': 'forecast'},inplace=True)
int_df.rename(columns={'device': 'room'}, inplace=True)

<font color=gray>step 3 - Set sysdate as index for internal data

In [None]:
int_df.set_index(['sysdate'], inplace=True)

<a id='sec2pt4'></a>
## <font color=darkblue>Part 4 - Breakout DataFrames by location.</font>

<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Create new dataframes by device or internal location</font>
### <font color=blue>As this is taken during the summer, we can assume any temp below 70 F is to be removed.</font>

In [None]:
mb = int_df[int_df['room'] == 'Master Bedroom']
lr = int_df[int_df['room'] == 'Living Room']
gb = int_df[int_df['room'] == 'Guest Bedroom']
kt = int_df[int_df['room'] == 'Kitchen']
#---------------------------------------------#
# Deprecated with Z-scoring above
#---------------------------------------------#
#mb = mb[mb['temp'] > 69]
#mb = mb[mb['temp'] < 90]
#lr = lr[lr['temp'] > 69]
#lr = lr[lr['temp'] < 90]
#gb = gb[gb['temp'] > 69]
#gb = gb[gb['temp'] < 90]
#kt = kt[kt['temp'] > 69]
#kt = kt[kt['temp'] < 90]

<a id='sec2pt5'></a>
## <font color=darkblue>Part 5 - Visualize and Assess Environment Data.</font>

<a href ='#top'>Jump to Table of Contents</a>

<font color=gray> step 1 - Display data counts by device</font>

In [None]:
int_df.groupby(['room']).count()


<font color=gray>step 2 - Visualize the internal data's metadata
    
### <font color = blue size=5>Table 2b - Initial display of environment metadata with displays for location, temp, humidity.</font>

In [None]:
    for col in int_df.columns:
        plot_data = int_df[col].dropna()
        fig, ax = plt.subplots()
        ax.plot(plot_data.index.values, plot_data.values, label=col)

<font color=gray>step 3 - Merge internal data groups</font>

In [None]:
br = mb.join(gb, how='left',  lsuffix='_mstr', rsuffix='_guest')
lk = lr.join(kt, how='left', lsuffix='_living', rsuffix='_kit')
inside = br.join(lk, how='left')

<font color=gray>step 4- Visualize internal data</font>

### <font color = blue size=5>Table 2c - Cleaner environment temp with humidity</font>

In [None]:
inside.plot()

<a id='results'></a>
<font color=black size=1>=============================================================================================================</font>


<font color=darkpink size=10>Results</font>

<font color=black size=1>=============================================================================================================</font>

<a href ='#top'>Jump to Table of Contents</a>

A total of 4 sensors were used to tally the indoors environment. The kitchen (kt) and mstr bedroom (mb) sensors are using a DHT11 - so they're the least sensitive of all the other DHT-22 sensors. The sensors took reading every minute for at least 5 days straight. **The thermostat stayed fixed at 76 F, as a control.** 

**Identification of Items**
The kitchen and master bedroom both have direct plumbing in them, which would lean towards having a higher room humidity count than rooms without plumbing. Unexplainably, the living room had a higher humidity count than did the kitchen. Both kitchen and masterbath were checked for leaks and none were found. Further research is needed to explain why the living room consistently had a higher humidity rate than the kitchen.

<a id='sec3pt1'></a>
## <font color=darkblue>Part 1 - Present Weather Findings</font>


<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Present humidity corresponding to temp.</font>

    
### <font color = blue size=5>Table 3a - Display of weather correlating humidity to temp.</font>

In [None]:
ax = ext_atl.plot(kind="scatter", x='hum', y="temp",label="ATL", c='b')
ax2 = ext_marietta.plot(kind="scatter", x='hum', y="temp",label="MAR", c='orange', ax = ax)
ax3 = ext_coke.plot(kind="scatter", x='hum', y="temp",label="COKE", c='r' , ax = ax)
ax4 = ext_stonemtn.plot(kind="scatter", x='hum', y="temp",label="STNMNT", c='brown' , ax = ax)

<font color=gray>step 2 - Show the weather elements standard deviations between parts of city</font>

In [None]:
print("######################################")
print("#Weather Standard Deviations between parts of the city.")
print("######################################")
print("# Keys: ext_atl      General ATL weather ")
print("#       ext_coke     Coca-Cola Olypmic Village")
print("#       ext_marietta KFC 40' chicken in Marietta")
print("#       ext_stonemtn Stone Mountain Park, Stone Mtn., Ga.")
print("#")
print('# ext_atl      temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(ext_atl['temp'].std(), ext_atl['hum'].std()))
print('# ext_coke     temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(ext_coke['temp'].std(), ext_coke['hum'].std()))
print('# ext_marietta temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(ext_marietta['temp'].std(), ext_marietta['hum'].std()))
print('# ext_stonemtn temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(ext_stonemtn['temp'].std(), ext_stonemtn['hum'].std()))


<font color=gray>step 3 - Show the Weather correlation between parts of city</font>

In [None]:
print("######################################")
print("# External Environment Correlation Information.")
print("######################################")
print("# Keys: ext_atl      General ATL weather ")
print("#       ext_coke     Coca-Cola Olypmic Village")
print("#       ext_marietta KFC 40' chicken in Marietta")
print("#       ext_stonemtn Stone Mountain Park, Stone Mtn., Ga.")
print("#")
print('# Correlate ext_atl to ext_coke     Temperature:{0:4.1f}%'.format(ext_atl.corrwith(ext_coke)[0] * 100))
print('#                                   Humidity:{0:4.1f}%'.format(ext_atl.corrwith(ext_coke)[1] * 100))
print("#")
print('# Correlate ext_atl to ext_marietta Temperature:{0:4.1f}%'.format(ext_atl.corrwith(ext_marietta)[0] * 100))
print('#                                   Humidity:{0:4.1f}%'.format(ext_atl.corrwith(ext_marietta)[1] * 100))
print("#")
print('# Correlate ext_atl to ext_stonemtn Temperature:{0:4.1f}%'.format(ext_atl.corrwith(ext_stonemtn)[0] * 100))
print('#                                   Humidity:{0:4.1f}%'.format(ext_atl.corrwith(ext_stonemtn)[1] * 100))
print("#")

<a id='sec3pt2'></a>
## <font color=darkblue>Part 2 - Present Environment findings</font>


<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Present humidity corresponding to temp.</font>

    
### <font color = blue size=5>Table 3b - Display of environment correlating humidity to temp.</font>

In [None]:
ax = mb.plot(kind="scatter", x='hum', y="temp",label="MSTR", c='b')
ax2 = lr.plot(kind="scatter", x='hum', y="temp",label="LIVING", c='orange', ax = ax)
ax3 = gb.plot(kind="scatter", x='hum', y="temp",label="GUEST", c='r' , ax = ax)
ax4 = kt.plot(kind="scatter", x='hum', y="temp",label="KITCHEN", c='brown' , ax = ax)

<font color=gray>step 2 - Show the environment elements standard deviations between parts of home</font>

In [None]:
print("######################################")
print("#Environment Standard Deviations between parts of the home.")
print("######################################")
print("# Keys: kt           Kitchen - nearest thermostat ")
print("#       lr           Living Room")
print("#       mb           Master Bed")
print("#       gb           Guest Bed")
print("#")
print('# kt      temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(kt['temp'].std(), kt['hum'].std()))
print('# lr      temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(lr['temp'].std(), lr['hum'].std()))
print('# mb      temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(mb['temp'].std(), mb['hum'].std()))
print('# gb      temp & hum standard deviation: {0:4.2f} \t{1:4.2f}'.format(gb['temp'].std(), gb['hum'].std()))

<font color=gray>step 3 - Show the environment correlations between parts of home</font>

In [None]:
print("######################################")
print("#Environment Correlations between parts of the home.")
print("######################################")
print("# Keys: kt           Kitchen - nearest thermostat ")
print("#       lr           Living Room")
print("#       mb           Master Bed")
print("#       gb           Guest Bed")
print("#")
print('# Correlate kt to lr     Temperature:{0:4.1f}%'.format(kt.corrwith(lr)[0] * 100))
print('#                           Humidity:{0:4.1f}%'.format(kt.corrwith(lr)[1] * 100))
print("#")
print('# Correlate kt to mb      Temperature:{0:4.1f}%'.format(kt.corrwith(mb)[0] * 100))
print('#                            Humidity:{0:4.1f}%'.format(kt.corrwith(mb)[1] * 100))
print("#")
print('# Correlate kt to gb      Temperature:{0:4.1f}%'.format(kt.corrwith(gb)[0] * 100))
print('#                            Humidity:{0:4.1f}%'.format(kt.corrwith(gb)[1] * 100))
print("#")
print('# Correlate mb to gb      Temperature:{0:4.1f}%'.format(mb.corrwith(gb)[0] * 100))
print('#                            Humidity:{0:4.1f}%'.format(mb.corrwith(gb)[1] * 100))
print("#")
print('# Correlate mb to lr      Temperature:{0:4.1f}%'.format(mb.corrwith(lr)[0] * 100))
print('#                            Humidity:{0:4.1f}%'.format(mb.corrwith(lr)[1] * 100))
print("#")
print('# Correlate gb to lr      Temperature:{0:4.1f}%'.format(gb.corrwith(lr)[0] * 100))
print('#                            Humidity:{0:4.1f}%'.format(gb.corrwith(lr)[1] * 100))

<a id='sec3pt3'></a>
## <font color=darkblue>Part 3 - Present derived Weather Values</font>


<a href ='#top'>Jump to Table of Contents</a>

### <font color = blue size=5>Table 3c - Re-display the derived weather avg_data.</font>

In [None]:
ax = ext_df['avg_hum'].plot()
ax1 = ext_df['avg_temp'].plot(ax = ax, figsize=(22,11))


<a href ='#top'>Jump to Table of Contents</a>

### <font color = blue size=5>Table 3d - Display HeatMap weather avg_data to reported_data.</font>
##### <font color=purple>As stated in this document, no real value seen from using derived values.</font>

In [None]:
merged_df = ext_df.join(int_df, how='left', lsuffix='_out', rsuffix='_in')
data=merged_df.corr()
fig = plt.figure()
ax = fig.add_subplot(111)
#plt.matshow(data)
alpha = ['temp_out', 'hum_out', 'avg_temp', 'avg_hum','temp_in', 'hum_in']
cax = ax.matshow(data, interpolation='nearest')
fig.colorbar(cax)

ax.set_xticklabels([''] + alpha)
ax.set_yticklabels([''] + alpha)

plt.show()

<a id='sec3pt4'></a>
## <font color=darkblue>Part 4 - Present Composite of weather and environment</font>


<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 1 - Present Composite Correlation Plot.</font>
### <font color = blue size=5>Table 3e - Display of weather data merged with environment data.</font>

In [None]:
merged_df = ext_df.join(int_df, how='left', lsuffix='_out', rsuffix='_in')
merged_df.plot()

<font color=gray>step 2 - Present Composite Correlation stats.</font>

In [None]:
new_df['avg_temp_in'] = new_df['temp'].rolling(10).mean().dropna()
new_df['avg_hum_in'] = new_df['hum'].rolling(10).mean().dropna()
new_df = new_df[abs(stats.zscore(new_df['temp'])) < 2.0]
print(abs(stats.zscore(new_df['hum'])).min())
new_df = new_df[abs(stats.zscore(new_df['hum'])) < 2.0 ]
new_df.replace({'device': {'B8:27:EB:76:5F:45': 'Living Room', 'B8:27:EB:2D:40:28': 'Master Bedroom', 'B8:27:EB:37:B0:F8': 'Guest Bedroom', 'B8:27:EB:A9:D4:C2': 'Kitchen'}}, inplace=True)
new_df.rename(columns={'device': 'room'}, inplace=True)
new_df.set_index(['sysdate'], inplace=True)
merged_df = ext_df.join(new_df, how='left', lsuffix='_out', rsuffix='_in')
merged_df.corr()


<a id='conclusions'></a>
<font color=black size=1>=============================================================================================================</font>


<font color=darkpink size=10>Conclusions</font>

<font color=black size=1>=============================================================================================================</font>

<a href ='#top'>Jump to Table of Contents</a>

<a id='section4'></a>
# <font color=darkpink>Section 1: Comparative Analysis</font>
<a href ='#top'>Jump to Table of Contents</a>

<font color=gray>step 3 - Show the correlation between humidity and temp</font>

In [None]:
print("#---------------------------------#")
print("External temp correlation to forecast")
print("#---------------------------------#")
print(f"External correlation:",ext_df[['temp', 'forecast']].corr())
print(" ")
print(f"External covariance", ext_df[['temp', 'forecast']].cov())
print(" ")
print("#---------------------------------#")
print("External temp correlation to humidity")
print("#---------------------------------#")
print(f"External correlation:",ext_df[['temp', 'hum']].corr())
print(" ")
print(f"External covariance", ext_df[['temp', 'hum']].cov())
print(" ")
print("#---------------------------------#")
print("Internal temp correlation to humidity")
print("#---------------------------------#")
print(f"Internal correlation", int_df[['temp', 'hum']].corr())
print(" ")
print(f"Internal covariance",  int_df[['temp', 'hum']].cov())
print(" ")
print(" ")
print("#---------------------------------#")
print("Internal temp and humidity correlation to external temp and humidity")
print("#---------------------------------#")
ext_df.corrwith(int_df, axis=0)

<font color=gray>step 2 Visualize internal and external variables.</font>

    
### <font color = blue size=5>Table 4a - Overlay of Weather and Environment correlating data.</font>

In [None]:
ax = mb.plot(kind="scatter", x='hum', y="temp",label="MSTR", c='b', figsize = (22,11))
ax2 = lr.plot(kind="scatter", x='hum', y="temp",label="LIVING", c='orange', ax = ax)
ax3 = gb.plot(kind="scatter", x='hum', y="temp",label="GUEST", c='r' , ax = ax)
ax4 = kt.plot(kind="scatter", x='hum', y="temp",label="KITCHEN", c='brown' , ax = ax)
ax5 = ext_atl.plot(kind="scatter", x='hum', y="temp",label="ATL", c='green' , ax = ax)
ax6 = ext_marietta.plot(kind="scatter", x='hum', y="temp",label="MARI", c='purple' , ax = ax)
ax7 = ext_stonemtn.plot(kind="scatter", x='hum', y="temp",label="STNMTN", c='gray' , ax = ax)
ax8 = ext_coke.plot(kind="scatter", x='hum', y="temp",label="COKE", c='yellow' , ax = ax)

<a id='sec4pt2'></a>
# <font color=darkpink>Section 2: Conclusions</font>
<a href ='#top'>Jump to Table of Contents</a>

In this project, we used inexpensive parts to collect the indoor's temperature and humidity and queried darksky.net using the Python programming language for weather data. The weather information was only cleaned for missing data, assuming the api provider was sending credible data, whereas the internal data was cleaned using Z-scoring, allowing for 2 standard deviations. 
   The correlations found were consistent with general weather and environmental properties. Depending on the amount of data pulled, with 5 days being the minimum (yielding over 600 weather records  and over 7,000 environment records per day) , we found a **correlation of -. 91 between the external temperture and humidity. The indoor variables had a -.87 correlation between the external temperature and humidity**. Rolling averages on external temp and humidity were computed and graphed, but were found not to provide any additional insights. With the wide fluctuations on the weather's temp and humidity, we found the environment (internal) temp to remain within a narrow, consistent range of no more than -2 degrees difference between thermostat settings and actual indoor temperature. This suggests that the a/c cooling unit is working at expected capacity, but ventilation and other factors led to non-trivial disparity in cooling each room. **The disparity is supported by the composite correction stats showing only a 37% correlation between the temp indoors and the average temp indoors**(derived from a rolling mean).
   **These findings suggest making indoor changes to improve cooling efficiencies, and that a stronger a/c unit would not yield any significant value**. Changes include reducing the amount of direct sunlight entering the room, restricting ventilation in rooms which cool below thermostat settings and adding a dehumidifier to supplement the a/c unit's work.

<a id='sec4pt3'></a>
# <font color=darkpink>Section 3: Limitations and Future plans</font>
<a href ='#top'>Jump to Table of Contents</a>

We admit the accuracy of the indoor findings are not scientific by any means. None of the sensors were calibrated nor compared to each other. The only comparisons were against the thermostat. We ascribe credulity to the findings, as the data, although potentially off by 5%, presented consistent data. We were able to draw conclusions based on the relativity of day to day reporting staying within a narrow range. 
   About the graphs and information reported, I am not a data scientist, and my only relation to the math field is through Information Technology. I am sure I missed more important points and did not present the most appropriate material, but I was only after enough to validate whether the a/c was cooling properly and if room specific changes were needed to average out cooling the home, overall.
   Not having to sink $5K into a new a/c system, I consider this a success. The next steps are to shrink the data collection elements, so they are plug and play, replace all dht-11 sensors with AM-312, DS18B20 or DHT-22 sensors The largest challenge lies in setting up the wifi on each collection device, although not arduous for setting up within a home, anything larger could prove to be task intensive. The end game is to deploy the solution to a warehouse and assist in tuning the warehouse environment to improve heating and cooling. Other plans are to set up agricultural solution, measuring both outside temp and inside a hothouse or nursery.
   Another domestic, indoor test will commence with the onset of winter in order to study and maximize heating efficiencies.