# Flask Playground

### This notebook containers Python code used to pull information from Excel spreadsheets and load it into an Influxdb. Computation is done over the data set and stored within a mongodb. This code would in the future be moved into the flask API but in order to do this the full Selenium python scraper would need to be built out.

In [1]:
import sys
import json
import datetime
import time

import pprint
pp = pprint.PrettyPrinter(indent=4)

from pymongo import MongoClient
from bson.objectid import ObjectId
from influxdb import InfluxDBClient
from influxdb import DataFrameClient


import pandas as pd
import numpy as np
# import matplotlib.pyplot as plt

import time
import datetime

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
from difflib import get_close_matches
import random

## Configure Mongodb

In [2]:
mongoClient = MongoClient('95.179.179.222',27017)
db = mongoClient.witsECOMP

## Configure InfluxDB

In [3]:
databaseName = "timeseriesdata"
influxClient = DataFrameClient("95.179.179.222", 8086, "root", "root", database=databaseName)
influxClient.create_database(databaseName)

In [None]:
# influxClient.drop_database(databaseName)

## Read in time Series from excel spreadsheet
Note: this process can take a long time (2 to 3 minutes!).

In [None]:
spreadsheet = pd.read_excel('/home/chris/Desktop/Wits Data/WITS ecWIN7 Data/WITS ecWIN7 Data 2018.xlsx',
                            sheet_name='WITS ecWIN7 Data 2018',
                            index_col="DateTime"
                           )

## Read in building names and geojson information
These files were manually created by generating geojson from http://geojson.io/ and identifying the corisponding building names.

In [None]:
buildingSensorNames = spreadsheet.columns
print(buildingSensorNames)

In [None]:
loadedGeoJson = {}
with open('../../Assets/geojson/witsMainCampusGeojson.json') as json_data:
    loadedGeoJson = json.load(json_data)
    print(loadedGeoJson)

In [None]:
buildingNamesStore = db.buildingNames
#Itterate over all buildings defined in the geojson and for each find the most likely sensor from the excel spreadhseet
#this is effectivly preforming a join between the two sets, based off weak links (the names)
for index, feature in enumerate(loadedGeoJson["features"]):
    buildingName = feature["properties"]["buildingName"]
    #Find the most likly sensor for that particular building
    mostLiklySensor = get_close_matches(buildingName + "_kVA", buildingSensorNames)[0]
    dataFrame = spreadsheet[mostLiklySensor].to_frame()
    influxClient.write_points(dataFrame, databaseName,{'buildingNumber': index, 'buildingName': buildingName, 'sensorName': mostLiklySensor})
    print(buildingName + "->" + mostLiklySensor)

## Preform Queries to load agrigated data into the mongoDB.

These queries MUST pull from the influxdb as this is where all the time series data is stored. As new records are added from Selenium (or directly from tnew sensors) they will be added to the Influx DB. this querying and updating mongo process occures every 30 minutes to generate "fresh" graphs.

First, we need to make the database store for each building

In [None]:
buildingInformationStore = db.buildingInformation

Then, itterate over all the buildings that have been identified in the geojson and query for values for that building from influxdb for the predefined length of time. For each element, influx db is queried and then the output is computed over to generate averages per day, week and year. The maximum building week is also extracted. This value is used later to generate the heat map.


In [None]:
buildingNamesStore = db.buildingNames
startTime = "2018-08-19T00:00:00+00:00"
buildingArray = []

#Define a structure used in the creation of tables. itterate over each to define the graphs to draw
plotsToDraw = {
    'Day':{
        "resampleLength":'1H',
        "windowLength": 2*24, #1 day window with 2 samples per hour
        "moduloSize": 24
        },
    'Week':{
        "resampleLength":'1D',
        "windowLength":2*24*7, # 1 week window with 2 samples per hour
        "moduloSize": 7
        },
    'Year':{
        "resampleLength":'1M',
        "windowLength":2*24*30*8, #1 year window with 2 samples per hour
        "moduloSize": 8
        },
}

for index, feature in enumerate(loadedGeoJson["features"]):
    buildingName = feature["properties"]["buildingName"]
    #This blob is in accordance with the datastructure defined by swagger
    buildingBlob = {
        "BuildingId": index,
        "BuildingRank": 0,
        "BuildingName": buildingName,
        "BensorName": "SENSOR",
        "ChartInformation": {
            "DayInformation": {
                "LastDay": [],
                "AverageDay": [],
                "LastDayAverage":0,
                "MaximumDay":0
            },
            "WeekInformation": {
                "LastWeek": [],
                "AverageWeek":[],
                "LastWeekAverage": 0,
                "MaximumWeek":0
            },
            "YearInformation": {          
                "LastYear": [],
                "AverageYear":[],
                "LastYearAverage":0,
                "MaximumYear":0
            }
        },
    }
    for plotType in plotsToDraw:
        #We want to get all the time series data for that particular building. The start time would be removed if live
        #data was added to the data set as we would want to query results from the current time back to the begining of the set
        query = "SELECT * FROM timeseriesdata WHERE buildingName='{}' AND time< '{}'".format(buildingName, startTime)
        queryResults = influxClient.query(query)
        #We need to extract the time series data from the influx query results
        results = queryResults["timeseriesdata"][queryResults["timeseriesdata"].keys()[0]]
        #For the resampled region(to calculculate the last day/week/year) we dont need the full set so sub
        #set using the tail (to get last x entries)
        resampledResults = results.tail(plotsToDraw[plotType]['windowLength']).resample(plotsToDraw[plotType]['resampleLength'], label='right').sum()
        #we can calculate the average overthe last period resampled region
        buildingBlob["ChartInformation"][plotType+"Information"]["Last" + plotType + "Average"]=resampledResults.mean()
        
        #Next, we want to convert the results into a format that mongo can accept. we cast it to a dict
        #and then itterate over all results
        resultsDict = resampledResults.to_dict()
        formattedDict = {} 
        for result in resultsDict:
            formattedDict[str(result)[0:19]] = resultsDict[result]
        buildingBlob["ChartInformation"][plotType+"Information"]["Last" + plotType] = formattedDict
        
        #The last step is to calculate the period average over all samples(eg the average week, over all data points)
        
        resampledResults_totalSet = results.resample(plotsToDraw[plotType]['resampleLength'], label='right').sum()

        buildingBlob["ChartInformation"][plotType+"Information"]["Maximum"+plotType]=np.max(resampledResults_totalSet)
        resampledResults_totalSetDict = resampledResults_totalSet.to_dict()
        
        timeSeriesValues =  [[] for _ in range(plotsToDraw[plotType]['moduloSize'])]
        for index, result in enumerate(resampledResults_totalSetDict):
            relativeIndex = index % plotsToDraw[plotType]['moduloSize']
            timeSeriesValues[relativeIndex].append(resampledResults_totalSetDict[result])
        
        finalizedAverage = []
        for result in timeSeriesValues:
            finalizedAverage.append(np.array(result).mean())
  
        #In order to plot these values correctly, they need to be bound off the last week key values        
        averageResultsDict = {}
        for index, key in enumerate(formattedDict):
            averageResultsDict[key] = finalizedAverage[index]
        buildingBlob["ChartInformation"][plotType+"Information"]["Average" + plotType]=averageResultsDict

        
    buildingArray.append(buildingBlob)
    print ("Building: {} Added with id {}".format(buildingName, len(buildingArray)))

# Last step is to order the buildings to extract position of building relative to others
sortedArray = sorted(buildingArray, key=lambda k: k['ChartInformation']['YearInformation']['LastYearAverage'], reverse=True) 


for index, element in enumerate(sortedArray):
    buildingId = element["BuildingId"]
    buildingName = element["BuildingName"]
    element["BuildingRank"] = index
    buildingInformation_id = buildingInformationStore.insert_one(element).inserted_id
    buildingNamesStore.insert_one({"buildingName": buildingName, "BuildingId": buildingId}).inserted_id
    print("record added to db")

# Calculate the university wide metrics
We want to be able to view the relative values for each set. These are averages over all time for all buildings. We already know what the average for each building is from before so we just need to find the average over all buildings for those particular sets.

In [None]:
buildingInformation = list(buildingInformationStore.find({}))

campusInformation = {
    "AveragePastDay": {},
    "AveragePastWeek": {},
    "AveragePastYear": {},
    "Maximums": {
        "Day": 0,
        "Week": 0,
        "Year": 0
    },
    "MaximumLastPeriodAverage": {
        "Day": 0,
        "Week": 0,
        "Year": 0
    }
}
#Itterate through all the plots that need averages
for plotType in plotsToDraw:
    typeAverageArray = []
    
    #For each building, extract their relevent time series values to sum over to get averages
    #Each position in the timeSeriesValues array corisponds to a day(it is an array of 7 for a week, for example)
    timeSeriesValues =  [[] for _ in range(plotsToDraw[plotType]['moduloSize'])]
    #itterate over all buildings
    for building in buildingInformation:
        spesificBuildingArray = building["ChartInformation"][plotType+"Information"]["Average" + plotType].values()
        for index, value in enumerate(spesificBuildingArray):
            timeSeriesValues[index].append(value)
    
    #calculate each everage and assign keys as time stamp values    
    for index, key in enumerate(building["ChartInformation"][plotType+"Information"]["Average" + plotType].keys()):
        numpyArray = np.array(timeSeriesValues[index])
        campusInformation["AveragePast"+plotType][key] = numpyArray.mean()
#         print(numpyArray.mean())
    
#         campusInformation["Maximums"][plotType] = np.max(numpyArray)
#         print(campusInformation["AveragePast"+plotType][key])
    
    #Find maximum and MaximumLastPeriodAverage for each time period and asign it to struct
    for building in buildingInformation:
        buildingMax = building["ChartInformation"][plotType+"Information"]["Maximum"+plotType]
        if buildingMax > campusInformation["Maximums"][plotType]:
            campusInformation["Maximums"][plotType] = buildingMax
        buildingAverage = building["ChartInformation"][plotType+"Information"]["Last"+plotType+"Average"]
        if buildingAverage > campusInformation["MaximumLastPeriodAverage"][plotType]:
            campusInformation["MaximumLastPeriodAverage"][plotType] = buildingAverage
print(campusInformation)
campusInfoStore = db.campusInfo
campusInfo_id = campusInfoStore.insert_one(campusInformation).inserted_id

# Generate geojson for each location
Colours need to be assigned based off a buildings magnitude. 
First, define a set of functions to convert a max/min/value into a RGB value. This value is used to define the colour on the final heatmap

In [None]:
def rgb(minimum, maximum, value):
    minimum, maximum = float(minimum), float(maximum)
    ratio = 2 * (value-minimum) / (maximum - minimum)
    b = int(max(0, 255*(1 - ratio)))
    r = int(max(0, 255*(ratio - 1)))
    g = 255 - b - r
    return r, g, b

def getHex(minimum,maximum,value):
    return '#%02x%02x%02x' % rgb(minimum,maximum,value)

In [None]:
getHex(0,10,5)

Next, we itterate over the geojson set and update the square heat based of the set maximum

In [None]:
 # We want to gives the colours for the heatmap to show the consumption averages & maximums over a day, week and year
groupings = ["Average","Maximum"]
for group in groupings:
    for plotType in plotsToDraw:
        for index, feature in enumerate(loadedGeoJson["features"]):
            buildingName = feature["properties"]["buildingName"]
            if group =="Average":
                value = buildingInformationStore.find_one({"BuildingName": buildingName})['ChartInformation'][plotType + 'Information']["Last" +plotType + group]
                maximum = campusInformation["MaximumLastPeriodAverage"][plotType]
            else:
                maximum = campusInformation["Maximums"][plotType]
                value = buildingInformationStore.find_one({"BuildingName": buildingName})['ChartInformation'][plotType + 'Information'][group + plotType]
            newColour = getHex(0,
                               maximum,
                               value)
            print(maximum,value,newColour)
#             print(buildingInformationStore.find_one({"BuildingName": buildingName})['ChartInformation'][plotType + 'Information']['Last'+ plotType + group])
            loadedGeoJson["features"][index]["properties"][plotType + "Style_" + group]["fillColor"] = newColour
            loadedGeoJson["features"][index]["properties"]["buildingId"] = index #set the index of the building based off the previous indecies
print(loadedGeoJson)
    

# Insert the geojson into the database
Now that the heatmap colours have been calculated, we can incert it into the database

In [None]:
geojsonStore = db.geojson
geojson_id = geojsonStore.insert_one(loadedGeoJson).inserted_id

In [None]:
loadedCampusInfo = {}
with open('../../Assets/otherinformation/campusInfomation.json') as json_data:
    loadedCampusInfo = json.load(json_data)
    print(loadedCampusInfo)

In [None]:
loadedGeoJson = {}
with open('../../Assets/geojson/witsMainCampusGeojson.json') as json_data:
    loadedGeoJson = json.load(json_data)
    print(loadedGeoJson)

In [None]:
BuildingChartInformation = {}
with open('../../Assets/ChartInformation/BuildingChartInformation.json') as json_data:
    BuildingChartInformation = json.load(json_data)
    print(BuildingChartInformation)

In [None]:
buildingInformationStore = db.buildingInformation
buildingInformation_id = buildingInformationStore.insert_one(BuildingChartInformation).inserted_id

In [None]:
list(buildingInformationStore.find({"buildingId": "0"}))