# CS506 Project Initial Report

## Chengyu Deng, Xiaotong Niu, Qian Zhang

This Jupyter Notebook is the code for the CS506 Initial Report. Basically what we did is processing the main data set and ready to use for analysis. 


**Part 1 Data processing**

We used Building Energy Reporting And Disclosure Ordinance (BERDO) data provided by [Analyze Boston - Boston.gov](https://data.boston.gov/) to do the analysis. 
* **_berdo2017.csv_** for the year of 2017
* **_2016-reported-energy-and-water-metrics.xlsx_** for the year of 2016
* **_2015-reported-energy-and-water-metrics.xlsx_** for the year of 2015

The source of the data is: https://data.boston.gov/dataset/building-energy-reporting-and-disclosure-ordinance


In [12]:
# Import data and put them into Pandas dataframes
import csv
import pandas as pd
import numpy as np

# 2015 xlsx -> csv
data_2015_xlsx = pd.read_excel('2015-reported-energy-and-water-metrics.xlsx', index_col=None)
data_2015_xlsx.to_csv('berdo2015.csv')

# 2016 xlsx -> csv
data_2016_xlsx = pd.read_excel('2016-reported-energy-and-water-metrics.xlsx', index_col=None)
data_2016_xlsx.to_csv('berdo2016.csv')

# 2017
df_2015 = pd.read_csv('berdo2015.csv')
df_2016 = pd.read_csv('berdo2016.csv')
df_2017 = pd.read_csv('berdo2017.csv', encoding = "ISO-8859-1")


# Testing code (For debugging purpose)

# print(type(df_2015))
# print(type(df_2016))
# print(type(df_2017))
# print('-------------------------------')
# print(df_2015.shape)
# print(df_2016.shape)
# print(df_2017.shape)
# print('-------------------------------')
# print(list(df_2015.columns.values))
# print('-------------------------------')
# print(list(df_2016.columns.values))
# print('-------------------------------')
# print(list(df_2017.columns.values))
# print('-------------------------------')

In [40]:
# Data Trimming

# Dataframe columns names handling
if 'Years Reported' in df_2016:
    df_2016.drop('Years Reported', axis = 1, inplace = True)
if 'Years Reported' in df_2017:
    df_2017.drop('Years Reported', axis = 1, inplace = True)

if ' Gross Area (sq ft) ' in df_2017:
    df_2017.rename(index=str, columns={' Gross Area (sq ft) ': 'Gross Area (sq ft)'}, inplace = True)
if ' GHG Emissions (MTCO2e) ' in df_2017:
    df_2017.rename(index=str, columns={' GHG Emissions (MTCO2e) ': 'GHG Emissions (MTCO2e)'}, inplace = True)
if ' Total Site Energy (kBTU) ' in df_2017:
    df_2017.rename(index=str, columns={' Total Site Energy (kBTU) ': 'Total Site Energy (kBTU)'}, inplace = True)
if ' Onsite Renewable (kWh) ' in df_2017:
    df_2017.rename(index=str, columns={' Onsite Renewable (kWh) ': 'Onsite Renewable (kWh)'}, inplace = True)

    
# Testing code (For debugging purpose)

# print(type(df_2015))
# print(type(df_2016))
# print(type(df_2017))
# print(df_2015.shape)
# print(df_2016.shape)
# print(df_2017.shape)
# print('-------------------------------')
# print(list(df_2015.columns.values))
# print('-------------------------------')
# print(list(df_2016.columns.values))
# print('-------------------------------')
# print(list(df_2017.columns.values))
# print('-------------------------------')


# Select properties which belong to BU
with open("BU_Property_List.csv", 'r') as propertyFile:
    reader = csv.reader(propertyFile, delimiter='\t')
    propertyList_convol = list(reader)
    
    propertyList = []
    for each in propertyList_convol:
        for element in each:
            propertyList.append(element)
            
df2015_BU = df_2015.loc[df_2015['Property Name'].isin(propertyList)]
df2016_BU = df_2016.loc[df_2016['Property Name'].isin(propertyList)]
df2017_BU = df_2017.loc[df_2017['Property Name'].isin(propertyList)]


# Testing code (For debugging purpose)

# print(len(propertyList))
# print('------------------------------------')
# print(propertyList)
# print('------------------------------------')
# # print(df2015_BU)
# # print(df2016_BU)
# # print(df2017_BU)
# print('------------------------------------')
# print(df2015_BU.shape)
# print(df2016_BU.shape)
# print(df2017_BU.shape)

62
------------------------------------
['785 Commonwealth Avenue (1 University Road)', '233 Bay State Road', "6-8 St. Mary's Street", '722-728 Commonwealth Avenue', '40-48 Buswell Street', '514-522 Park Drive', '110-112 Cummington Street', '700 Commonwealth Avenue', '64-86 Cummington Street', '22-24 Buswell Street', '10-18 Buswell Street', '213-217 Bay State Road', '631-639 Commonwealth Avenue', '605-615 Commonwealth Avenue (2 Silber Way)', '111 Cummington Street (664-666 Comm. Ave)', '30-38 Cummington Street', '20-28 Cummington Street (24 Cummington St)', '2 Cummington Street (23-25 Blandford Street)', '622-640 Commonwealth Avenue', '610 Commonwealth Avenue', '582-596 Commonwealth Avenue (590 Comm / 712 Bea)', '580 Commonwealth Avenue', 'Hotel Commonwealth', '631-639 Commonwealth Avenue', '605-615 Commonwealth Avenue (2 Silber Way)', '140 Bay State Road', '577-599 Commonwealth Avenue (595 Comm Ave)', '565-575 Commonwealth Avenue', '100 Bay State Road', '91 Bay State Road', '660 Beaco

**Part 2 Monthly energy consumption assignment**

Since the BERDO data is yearly based, we need to calculate the monthly energy consumption for each building based on the energy consumption distribution provided by Kevin Zheng from Sustainability@BU

In [39]:
# Calculate monthly energy consumption electricity/natrual gas

#[Jan, Feb, Mar, ..., Nov, Dec]
share_2015 = np.array([0.2038, 0.1062, 0.0778, 0.0620, 0.0589, 0.0588, 0.0473, 0.0813, 0.0705, 0.0732, 0.0673, 0.0929])
share_2016 = np.array([0.0919, 0.1179, 0.0995, 0.0642, 0.0710, 0.0733, 0.0660, 0.0673, 0.0733, 0.0774, 0.0959, 0.1023])
share_2017 = np.array([0.1146, 0.1031, 0.0872, 0.0862, 0.0699, 0.0650, 0.0661, 0.0635, 0.0865, 0.0671, 0.0727, 0.1181])

#          Jan                               Dec
#[[Natural Gas, Electricity], ...[Natural Gas, Electricity]]
NESplits_2015 = np.array(
    [[0.6877, 0.3123], 
     [0.6654, 0.3346], 
     [0.6079, 0.3921], 
     [0.5086, 0.4914], 
     [0.3888, 0.6112], 
     [0.3663, 0.6337], 
     [0.1913, 0.8087], 
     [0.5173, 0.4827], 
     [0.5090, 0.4910], 
     [0.5598, 0.4402], 
     [0.5245, 0.4755], 
     [0.6743, 0.3257]])

NESplits_2016 = np.array(
    [[0.6519, 0.3481], 
     [0.6802, 0.3198], 
     [0.6496, 0.3504], 
     [0.5395, 0.4605], 
     [0.4350, 0.5650], 
     [0.4154, 0.5846], 
     [0.3339, 0.6661], 
     [0.3336, 0.6664], 
     [0.4442, 0.5558], 
     [0.5321, 0.4679], 
     [0.6225, 0.3775], 
     [0.6477, 0.3523]])

NESplits_2017 = np.array(
    [[0.6739, 0.3261], 
     [0.6707, 0.3293], 
     [0.6211, 0.3789], 
     [0.5920, 0.4080], 
     [0.5074, 0.4926], 
     [0.3905, 0.6095], 
     [0.3199, 0.6801], 
     [0.3178, 0.6822],
     [0.3672, 0.6328], 
     [0.3960, 0.6040],  
     [0.5386, 0.4614], 
     [0.6525, 0.3475]])


# Testing code (For debugging purpose)

# print(np.sum(share_2015))
# print(np.sum(share_2016))
# print(np.sum(share_2017))
# print('------------------------------------')
# print(np.sum(NESplits_2015))
# print(np.sum(NESplits_2016))
# print(np.sum(NESplits_2017))

# print(share_2015)
# print(share_2016)
# print(share_2017)
# print(NESplits_2015)
# print(NESplits_2016)
# print(NESplits_2017)

[ 0.2038  0.1062  0.0778  0.062   0.0589  0.0588  0.0473  0.0813  0.0705
  0.0732  0.0673  0.0929]
[ 0.0919  0.1179  0.0995  0.0642  0.071   0.0733  0.066   0.0673  0.0733
  0.0774  0.0959  0.1023]
[ 0.1146  0.1031  0.0872  0.0862  0.0699  0.065   0.0661  0.0635  0.0865
  0.0671  0.0727  0.1181]
[[ 0.6877  0.3123]
 [ 0.6654  0.3346]
 [ 0.6079  0.3921]
 [ 0.5086  0.4914]
 [ 0.3888  0.6112]
 [ 0.3663  0.6337]
 [ 0.1913  0.8087]
 [ 0.5173  0.4827]
 [ 0.509   0.491 ]
 [ 0.5598  0.4402]
 [ 0.5245  0.4755]
 [ 0.6743  0.3257]]
[[ 0.6519  0.3481]
 [ 0.6802  0.3198]
 [ 0.6496  0.3504]
 [ 0.5395  0.4605]
 [ 0.435   0.565 ]
 [ 0.4154  0.5846]
 [ 0.3339  0.6661]
 [ 0.3336  0.6664]
 [ 0.4442  0.5558]
 [ 0.5321  0.4679]
 [ 0.6225  0.3775]
 [ 0.6477  0.3523]]
[[ 0.6739  0.3261]
 [ 0.6707  0.3293]
 [ 0.6211  0.3789]
 [ 0.592   0.408 ]
 [ 0.5074  0.4926]
 [ 0.3905  0.6095]
 [ 0.3199  0.6801]
 [ 0.3178  0.6822]
 [ 0.3672  0.6328]
 [ 0.396   0.604 ]
 [ 0.5386  0.4614]
 [ 0.6525  0.3475]]


In [47]:
# Generate the data frame 
# print(df2015_BU.shape)
# print(df2016_BU.shape)
# print(df2017_BU.shape)
# print(list(df_2015.columns.values))
# print(df2015_BU.head(20))

**Part 3 Visualization of monthly energy consumption**

Visualize the processed data and be ready to do further analysis. 

In [None]:
# Data visualization
