# Prediction of energy consumption of non-residential buildings in San Francisco

July 2019

*Project Description:*

We will use this dataset to analyse and predict the Froth floating process having the two aims:

What is the best predictor for the iron concentration of the product?
Can the data set be used to predict the impurity of the product (by silicate concentration)?

*Data Description:*

This notebook deals with the analysis of a reverse cationic flotation process of a real production environment. The data (including its documentation) is accessible through kaggle: https://www.kaggle.com/san-francisco/sf-commercial-buildings-energy-performance-report/downloads/sf-commercial-buildings-energy-performance-report.zip/104

---

## Data Analysis
We start our analysis by importing required libraries:

In [2]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

%matplotlib inline
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.model_selection import train_test_split
from IPython.display import display
from sklearn import metrics

# include fasti.ai libraries
from fastai.tabular import *

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

import os
from IPython.display import display
pd.set_option('display.max_columns', None) # display all columns

# Any results you write to the current directory are saved as output.

['pred_Energy', '.DS_Store']


In [9]:
PATH = 'existing-commercial-buildings-energy-performance-ordinance-report.csv'
df = pd.read_csv(PATH, parse_dates = True)

Let's have a look at the data:

In [25]:
df.head()

Unnamed: 0,Parcel(s),Building Name,Building Address,Postal Code,Full.Address,Floor Area,Property Type,Property Type - Self Selected,PIM Link,Year Built,Energy Audit Due Date,Energy Audit Status,Benchmark 2018 Status,2018 Reason for Exemption,Benchmark 2017 Status,2017 Reason for Exemption,Benchmark 2016 Status,2016 Reason for Exemption,Benchmark 2015 Status,2015 Reason for Exemption,Benchmark 2014 Status,2014 Reason for Exemption,Benchmark 2013 Status,2013 Reason for Exemption,Benchmark 2012 Status,2012 Reason for Exemption,Benchmark 2011 Status,2011 Reason for Exemption,Benchmark 2010 Status,2010 Reason for Exemption,2018 ENERGY STAR Score,2018 Site EUI (kBtu/ft2),2018 Source EUI (kBtu/ft2),2018 Percent Better than National Median Site EUI,2018 Percentage Better than National Median Source EUI,2018 Total GHG Emissions (Metric Tons CO2e),2018 Total GHG Emissions Intensity (kgCO2e/ft2),2018 Weather Normalized Site EUI (kBtu/ft2),2018 Weather Normalized Source EUI (kBtu/ft2),2017 ENERGY STAR Score,2017 Site EUI (kBtu/ft2),2017 Source EUI (kBtu/ft2),2017 Percent Better than National Median Site EUI,2017 Percentage Better than National Median Source EUI,2017 Total GHG Emissions (Metric Tons CO2e),2017 Total GHG Emissions Intensity (kgCO2e/ft2),2017 Weather Normalized Site EUI (kBtu/ft2),2017 Weather Normalized Source EUI (kBtu/ft2),2016 ENERGY STAR Score,2016 Site EUI (kBtu/ft2),2016 Source EUI (kBtu/ft2),2016 Percent Better than National Median Site EUI,2016 Percentage Better than National Median Source EUI,2016 Total GHG Emissions (Metric Tons CO2e),2016 Total GHG Emissions Intensity (kgCO2e/ft2),2016 Weather Normalized Site EUI (kBtu/ft2),2016 Weather Normalized Source EUI (kBtu/ft2),2015 ENERGY STAR Score,2015 Site EUI (kBtu/ft2),2015 Source EUI (kBtu/ft2),2015 Percent Better than National Median Site EUI,2015 Percentage Better than National Median Source EUI,2015 Total GHG Emissions (Metric Tons CO2e),2015 Total GHG Emissions Intensity (kgCO2e/ft2),2015 Weather Normalized Site EUI (kBtu/ft2),2015 Weather Normalized Source EUI (kBtu/ft2),2014 ENERGY STAR Score,2014 Site EUI (kBtu/ft2),2014 Source EUI (kBtu/ft2),2014 Percent Better than National Median Site EUI,2014 Percent Better than National Median Source EUI,2014 Total GHG Emissions (Metric Tons CO2e),2014 Total GHG Emissions Intensity (kgCO2e/ft2),2014 Weather Normalized Site EUI (kBtu/ft2),2014 Weather Normalized Source EUI (kBtu/ft2),2013 ENERGY STAR Score,2013 Site EUI (kBtu/ft2),2013 Source EUI (kBtu/ft2),2013 Percent Better than National Median Site EUI,2013 Percent Better than National Median Source EUI,2013 Total GHG Emissions (Metric Tons CO2e),2013 Total GHG Emissions Intensity (kgCO2e/ft2),2013 Weather Normalized Site EUI (kBtu/ft2),2013 Weather Normalized Source EUI (kBtu/ft2),2012 ENERGY STAR Score,2012 Site EUI (kBtu/ft2),2012 Source EUI (kBtu/ft2),2012 Percent Better than National Median Site EUI,2012 Percent Better than National Median Source EUI,2012 Total GHG Emissions (Metric Tons CO2e),2012 Total GHG Emissions Intensity (kgCO2e/ft2),2012 Weather Normalized Site EUI (kBtu/ft2),2012 Weather Normalized Source EUI (kBtu/sq.ft),2011 ENERGY STAR Score,2011 Site EUI (kBtu/ft2),2011 Source EUI (kBtu/ft2),2011 Percent Better than National Median Site EUI,2011 Percent Better than National Median Source EUI,2011 Total GHG Emissions (Metric Tons CO2e),SF Find Neighborhoods,2011 Total GHG Emissions Intensity (kgCO2e/ft2),Current Police Districts,2011 Weather Normalized Site EUI (kBtu/ft2),Current Supervisor Districts,Analysis Neighborhoods,2011 Weather Normalized Source EUI (kBtu/ft2),:@computed_region_f58d_8dbm,:@computed_region_vtsz_7cme,:@computed_region_rxqg_mtj9,:@computed_region_jx4q_fizf
0,5370/054,225 INDUSTRIAL ST,225 INDUSTRIAL ST,94124,"{'longitude': '-122.405396', 'latitude': '37.7...",11200,Commercial,Other,{'url': 'http://propertymap.sfplanning.org/?&s...,1957.0,2014-04-01T00:00:00.000,Did Not Comply,Violation - Did Not Report,,Violation - Did Not Report,,Violation - Did Not Report,,Violation - Did Not Report,,Violation - Did Not Report,,Violation - Did Not Report,,Violation - Did Not Report,,Exempt,SqFt Not Subject This Year,Exempt,SqFt Not Subject This Year,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,82.0,,2.0,,9.0,1.0,,1.0,82.0,8.0,3.0
1,5585/003,Apparel Dorman 4-90,4-90 DORMAN AVE,94124,"{'longitude': '-122.403285', 'latitude': '37.7...",50422,Commercial,Non-Refrigerated Warehouse,{'url': 'http://propertymap.sfplanning.org/?&s...,1947.0,2020-04-01T00:00:00.000,Upcoming,Complied,,Complied,,Complied,,Complied,,Complied,,Complied,,Complied,,Complied,,Complied,,69.0,15.1,28.1,-31.6,-31.6,46.5,0.9,16.8,31.2,83.0,14.0,28.0,-46.5,-46.5,44.8,0.9,15.1,30.0,77.0,15.2,31.6,-37.9,-37.9,49.3,1.0,16.6,33.0,100.0,8.3,8.3,-91.0,-91.0,34.2,0.7,8.3,8.3,100.0,9.6,9.6,-89.4,-89.4,39.7,0.8,11.5,11.5,100.0,9.4,9.4,-90.3,-90.3,38.7,0.8,9.4,9.4,99.0,21.0,21.0,-78.3,-78.3,86.3,1.7,21.0,21.0,99.0,22.4,22.4,-77.2,-77.2,92.2,82.0,1.8,2.0,22.4,9.0,1.0,22.4,1.0,82.0,8.0,3.0
2,8706/010,Avalon At Mission Bay (Tower 1),255 King St,94107,"{'longitude': '-122.39278', 'latitude': '37.77...",250908,Multifamily,,{'url': 'http://propertymap.sfplanning.org/?&s...,2003.0,,,Upcoming,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,34.0,,1.0,,10.0,4.0,,20.0,34.0,9.0,2.0
3,7521/002-C,Diamond Heights Shopping Center - Bldg A - Saf...,5290 DIAMOND HEIGHTS BLVD,94131,"{'longitude': '-122.438211', 'latitude': '37.7...",35853,Commercial,Supermarket/Grocery Store,{'url': 'http://propertymap.sfplanning.org/?&s...,1964.0,2023-04-01T00:00:00.000,Upcoming,Complied,,Complied,,Complied,,Complied,,Exempt,,Exempt,,Exempt,,Exempt,,Exempt,,44.0,202.1,566.0,6.0,6.0,510.5,14.2,202.1,566.0,10.0,199.5,626.4,37.6,37.6,542.5,15.1,199.5,626.4,8.0,224.3,677.1,40.4,40.4,682.0,19.0,224.3,677.1,9.0,227.4,664.2,38.1,38.1,640.5,17.9,227.0,660.2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,57.0,,9.0,,5.0,10.0,,7.0,57.0,5.0,4.0
4,4811/003,Evergood Sausage Co.,1389 UNDERWOOD AVE,94124,"{'longitude': '-122.387882', 'latitude': '37.7...",35000,Commercial,Manufacturing/Industrial Plant,{'url': 'http://propertymap.sfplanning.org/?&s...,1966.0,2019-04-01T00:00:00.000,Complied,Complied,,Complied,,Complied,,Complied,,Complied,,Complied,,Complied,,Exempt,SqFt Not Subject This Year,Exempt,SqFt Not Subject This Year,,458.6,783.2,,,957.1,27.3,458.6,783.2,,479.5,872.2,,,1031.8,29.5,479.5,872.2,,444.5,834.2,,,1033.5,29.5,443.1,829.8,,384.1,731.3,,,870.1,24.9,381.6,723.6,,428.0,815.0,,,969.7,27.7,422.5,797.8,,455.1,840.6,,,1057.0,30.2,459.0,852.7,,442.4,815.7,,,989.6,28.3,442.4,815.7,,,,,,,86.0,,2.0,,9.0,1.0,,1.0,86.0,8.0,3.0


In [29]:
df.columns[4]

'Full.Address'

Now, we will drop some columns that dont seem to be useful for our analysis:

In [32]:
df_drop = df.drop(df[df.columns[4]],inplace = True)

KeyError: '[\'{\\\'longitude\\\': \\\'-122.405396\\\', \\\'latitude\\\': \\\'37.738345\\\', \\\'human_address\\\': \\\'{"address": "225 INDUSTRIAL ST", "city": "SAN FRANCISCO", "state": "CA", "zip": "94124"}\\\'}\'\n \'{\\\'longitude\\\': \\\'-122.403285\\\', \\\'latitude\\\': \\\'37.739578\\\', \\\'human_address\\\': \\\'{"address": "50 Dorman Ave", "city": "SAN FRANCISCO", "state": "CA", "zip": "94124"}\\\'}\'\n \'{\\\'longitude\\\': \\\'-122.39278\\\', \\\'latitude\\\': \\\'37.777164\\\', \\\'human_address\\\': \\\'{"address": "255 King St", "city": "SAN FRANCISCO", "state": "CA", "zip": "94107"}\\\'}\'\n \'{\\\'longitude\\\': \\\'-122.438211\\\', \\\'latitude\\\': \\\'37.743846\\\', \\\'human_address\\\': \\\'{"address": "5290 Diamond Heights Blvd", "city": "SAN FRANCISCO", "state": "CA", "zip": "94131"}\\\'}\'\n ...\n \'{\\\'longitude\\\': \\\'-122.388989\\\', \\\'latitude\\\': \\\'37.750152\\\', \\\'human_address\\\': \\\'{"address": "1100 CESAR CHAVEZ ST", "city": "SAN FRANCISCO", "state": "CA", "zip": "94107"}\\\'}\'\n \'{\\\'longitude\\\': \\\'-122.410188\\\', \\\'latitude\\\': \\\'37.762517\\\', \\\'human_address\\\': \\\'{"address": "1940 Bryant St", "city": "SAN FRANCISCO", "state": "CA", "zip": "94110"}\\\'}\'\n \'{\\\'longitude\\\': \\\'-122.392136\\\', \\\'latitude\\\': \\\'37.781037\\\', \\\'human_address\\\': \\\'{"address": "35 STANFORD ST", "city": "SAN FRANCISCO", "state": "CA", "zip": "94107"}\\\'}\'\n \'{\\\'longitude\\\': \\\'-122.391642\\\', \\\'latitude\\\': \\\'37.787085\\\', \\\'human_address\\\': \\\'{"address": "333 Harrison ST", "city": "SAN FRANCISCO", "state": "CA", "zip": "94105"}\\\'}\'] not found in axis'