# Benchmark calculcations for January and June proposals

**A Note About Source Data**

We ran our original calculations using RMI's building data. The first set of models run in this notebook used the publicly available building data from 2019, with calculations to reproduce the new fields created in the RMI dataset. This gave us massively skewed results. We're not sure if this is because the datasets were really off from one another or there's a calculation error. This will need to be investigated.

At the bottom of the notebook are these same proposals remodelled with RMI's building data, which shows us results consistent with what we've seen in the past. 


**Assumptions**:

- Buildings will produce the max amount of emissions, **unless** their predicted emissions is lower than the target GHGI, in which case they will produce that amount.
- Predicted emissions are calculated based off of the emissions predictions found in [the emissions data input file](../data/input_data/energy_emissions.csv). 


For each building, in each year (2027-2050), we need to calculate (column references given refer to the 2027 calculations in [the original RMI spreadsheet](https://docs.google.com/spreadsheets/d/175uipAHHQHGelq7i1n9sKWQXNQi-B1IJiF6-XQRVnE8/edit#gid=1811888818):

- `city_ghgi_target`: The GHGI target, as calculated by the sum of: use type * city's target GHGI for that use type * percent of GFA for that use type, for each of the three given use types (col A). If there is no target for any of the use types, this is NaN. If a building is multi-use, and some of the building's uses have a compliance threshold, but others don't, use the expected GHGI (greenhouse gas emissions intensity) if nothing is changed as the GHGI for the portion of the building that is not subject to BEPS yet.
    - Example: A building is 50% retail and 50% multifamily housing. The target for retail in 2033 is 1.03 and there is no target for multifamily housing. This building would have a GHGI of 4.0 in 2033 if no changes were made to it. We estimate the GHGHI target as `(0.5 * 1.03) + (0.5 * 4.0) = 2.515`. 
    - NB: These numbers will be different than the RMI calculations. The RMI model mistakenly used a GHGI of zero for parts of buildings that are not yet subject to BEPS. So in the example above, RMI's model would list the GHGI target as`0.5 * 1.03 + 0.5 * 0 = 0.515`, which leaves out half the building. Then the target GHGI would actually go **up** when the multifamily part of the building became subject to BEPS! This model corrects that error.
- `expected_baseline`: The expected emissions if nothing is changed about the building, as calculated by the sum of: `total use energy for type * energy emissions factor for energy type` for the three energy types (col C)
- `expected_baseline_ghgi`: The expected GHGI if nothing is changed about the building (col B), as calculated by the `expected emissions / total GFA`
- `compliant_ghgi`: The expected GHGI if the building is compliant with BEPS, as defined by (col H):
    - if the BEPS GHGI target is lower than the expected GHGI, use the BEPS GHGI target
    - if the expected GHGI is lower than the BEPS GHGI target, use the expected GHGI
- `compliant_emissions`: The expected emissions if the building is compliant with BEPS, as defined by the `compliant GHGI * total GFA` (col J)
- `compliance_status`: Whether or not the building is compliant (col K):
    - yes: the baseline GHGI is lower than the expected compliant GHGI for this year
    - no: the baseline GHGI is higher than the expected compliant GHGI for this year
    - no requirement yet: the building doesn't have a compliance requirement for this year
- `compliance_fees`: Noncompliance fees. For years where buildings will be taxed for being noncompliant, this is `$2.50 * total GFA`    

In [1]:
import pandas as pd
import numpy as np

In [2]:
from baseline_model import BaselineBEPSModel

In [3]:
pd.options.mode.chained_assignment = None

In [4]:
JAN_TARGETS_PATH = '../data/input_data/jan_proposal_emissions_targets.csv'
JUNE_TARGETS_PATH = '../data/input_data/june_proposal_emissions_targets.csv'
EMISSIONS_PATH = '../data/input_data/energy_emissions.csv'
BUILDING_DATA_PATH = '../data/input_data/cleaned_building_data_with_policy_gfa.csv'

JAN_FINE_YEARS = [2027, 2030, 2035, 2040, 2045, 2050]
JUNE_FINE_YEARS = [2030, 2035, 2040, 2045, 2050]
FINE_PER_SQ_FT = 2.5

## January Proposal

In [None]:
jan_model = BaselineBEPSModel(EMISSIONS_PATH, JAN_TARGETS_PATH, BUILDING_DATA_PATH, JAN_FINE_YEARS, FINE_PER_SQ_FT)

In [None]:
jan_model.calculate_baseline_model(2027,2050)

In [None]:
jan_model.scenario_results.head()

## Summary stats for January Proposal

In [None]:
jan_model.get_total_emissions_by_year()

In [None]:
jan_model.emissions_by_year

In [None]:
# Calculate percent of CO2 reduction by 2040

jan_model.get_percent_emissions_reduction_by_given_year(2040)

## June proposal

In [None]:
june_model = BaselineBEPSModel(EMISSIONS_PATH, JUNE_TARGETS_PATH, BUILDING_DATA_PATH, JUNE_FINE_YEARS, FINE_PER_SQ_FT)

In [None]:
june_model.calculate_baseline_model(2027,2050)

In [None]:
june_model.scenario_results.head()

## Summary stats for June

In [None]:
june_model.get_total_emissions_by_year()

In [None]:
june_model.emissions_by_year

In [None]:
june_model.get_percent_emissions_reduction_by_given_year(2040)

## Comparing January and June

In [None]:
june_model.scenario_results['compliant_emissions'].sum()

In [None]:
jan_model.scenario_results['compliant_emissions'].sum()

That's twice as much emissions in June vs January. Something is wrong. Let's look at the original RMI building spreadsheet.

## Proposals with Original RMI Spreadsheet

Something is wrong with the numbers we see above. Let's run the numbers again with the original RMI spreadsheet to verify that our model works correctly. If so, we can dig into what is wrong with the numbers we calculated for the public city data.

We'll rerun the numbers with the original RMI spreadsheet. Two things are added to this spreadsheet:

- building classification (A, B, C, etc. based on GFA that is subject to the policy)
- updated column names to match the model: 'Total_GFA' -> 'Total GFA for Policy', 'Steam(kBtu)' -> 'SteamUse(kBtu)', and 'percent_sqft_1st' -> 'LargestPropertyUseType Percent GFA', etc.

### January model

In [5]:
ORIG_BUILDING_DATA_PATH = '../data/input_data/rmi_building_analysis_with_new_col_names.csv'

In [6]:
orig_jan_model = BaselineBEPSModel(EMISSIONS_PATH, JAN_TARGETS_PATH, ORIG_BUILDING_DATA_PATH, JAN_FINE_YEARS, FINE_PER_SQ_FT)

In [7]:
orig_jan_model.calculate_baseline_model(2027, 2050)

Model calculations complete. Access the model dataframe as model_name.scenario_results


In [8]:
# Total emissions (kg) for Jan

orig_jan_model.scenario_results['compliant_emissions'].sum()

3430889229.1106

In [9]:
# Percent reduction by 2040

orig_jan_model.get_percent_emissions_reduction_by_given_year(2040)

0.7457294950189703

In [10]:
orig_jan_model.get_total_emissions_by_year()

Emissions by year calculations complete. Access the annual emissions dataframe as model_name.emissions_by_year


In [11]:
orig_jan_model.emissions_by_year

Unnamed: 0_level_0,compliant_emissions
year,Unnamed: 1_level_1
2027.0,340605000.0
2028.0,316438800.0
2029.0,298493300.0
2030.0,265394400.0
2031.0,240503900.0
2032.0,219451300.0
2033.0,200970800.0
2034.0,185148800.0
2035.0,170261200.0
2036.0,144863800.0


In [13]:
orig_jan_model.scenario_results[orig_jan_model.scenario_results['LargestPropertyUseType OSE'] == np.nan ]

Unnamed: 0,OSEBuildingID,BuildingName,Total GFA for Policy,sq_ft_classification,LargestPropertyUseType OSE,SecondLargestPropertyUseType OSE,ThirdLargestPropertyUseType OSE,year,expected_baseline,expected_baseline_ghgi,city_ghgi_target,compliant_ghgi,compliant_emissions,compliance_status,compliance_fees


### June proposal

In [14]:
orig_june_model = BaselineBEPSModel(EMISSIONS_PATH, JUNE_TARGETS_PATH, ORIG_BUILDING_DATA_PATH, JUNE_FINE_YEARS, FINE_PER_SQ_FT)

In [15]:
orig_june_model.calculate_baseline_model(2027, 2050)

Model calculations complete. Access the model dataframe as model_name.scenario_results


In [16]:
orig_june_model.scenario_results['compliant_emissions'].sum()

4896559099.697521

In [17]:
orig_june_model.get_percent_emissions_reduction_by_given_year(2040)

0.65034777451687

In [18]:
orig_june_model.get_total_emissions_by_year()

Emissions by year calculations complete. Access the annual emissions dataframe as model_name.emissions_by_year


In [19]:
orig_june_model.emissions_by_year

Unnamed: 0_level_0,compliant_emissions
year,Unnamed: 1_level_1
2027.0,438739600.0
2028.0,438739600.0
2029.0,438739600.0
2030.0,405100700.0
2031.0,347846400.0
2032.0,314811000.0
2033.0,292325600.0
2034.0,275747800.0
2035.0,246576300.0
2036.0,212650400.0


### Difference between January and June

How much less CO2 (in kgs) will be released with the original January plan vs. the original June plan?

In [20]:
1 - (orig_jan_model.scenario_results['compliant_emissions'].sum() / orig_june_model.scenario_results['compliant_emissions'].sum())

0.2993264945331633

## Orig RMI spreadsheet with proper column names

The numbers above are off. That's because the wrong column was labeled for the 'Total GFA for Policy' column. It should have been the 'Total_sqft' column. 

Running againt.

In [None]:
orig_building = pd.read_csv(ORIG_BUILDING_DATA_PATH)
orig_building.columns

In [None]:
orig_building.columns = ['Unnamed: 0.3', 'Unnamed: 0.2', 'Unnamed: 0.1', 'Unnamed: 0',
       'OSEBuildingID', 'BuildingName', 'BuildingType', 'Type_of_Bulding',
       'PropertyGFATotal', 'PropertyGFABuilding(s)', 'PropertyGFAParking',
       'Total GFA for Policy', 'LargestPropertyUseType Percent GFA',
       'SecondLargestPropertyUseType Percent GFA',
       'ThirdLargestPropertyUseType Percent GFA', 'LargestPropertyUseType',
       'LargestPropertyUseType OSE', 'LargestPropertyUseTypeGFA',
       'LargestPropertyUseTypeGFA Analysis', 'SecondLargestPropertyUseType',
       'SecondLargestPropertyUseType OSE', 'SecondLargestPropertyUseTypeGFA',
       'SecondLargestPropertyUseTypeGFA Analysis',
       'ThirdLargestPropertyUseType', 'ThirdLargestPropertyUseType OSE',
       'ThirdLargestPropertyUseTypeGFA',
       'ThirdLargestPropertyUseTypeGFA Analysis', 'Electricity(kBtu)',
       'SteamUse(kBtu)', 'NaturalGas(kBtu)', 'TotalGHGEmissions',
       'GHGEmissionsIntensity', 'Total_GFA',
       'sq_ft_classification']

In [None]:
orig_building.to_csv(ORIG_BUILDING_DATA_PATH)

In [None]:
orig_jan_model_updated_col_name = BaselineBEPSModel(EMISSIONS_PATH, JAN_TARGETS_PATH, ORIG_BUILDING_DATA_PATH, JAN_FINE_YEARS, FINE_PER_SQ_FT)

In [None]:
orig_jan_model_updated_col_name.calculate_baseline_model(2027, 2050)

In [None]:
# Total emissions (kg) for Jan

orig_jan_model_updated_col_name.scenario_results['compliant_emissions'].sum()

In [None]:
orig_jan_model_updated_col_name.get_percent_emissions_reduction_by_given_year(2040)

In [None]:
orig_jan_model_updated_col_name.get_total_emissions_by_year()

In [None]:
orig_jan_model_updated_col_name.emissions_by_year

In [None]:
orig_june_model_updated_col_name = BaselineBEPSModel(EMISSIONS_PATH, JUNE_TARGETS_PATH, ORIG_BUILDING_DATA_PATH, JUNE_FINE_YEARS, FINE_PER_SQ_FT)

In [None]:
orig_june_model_updated_col_name.calculate_baseline_model(2027, 2050)

In [None]:
# Total emissions (kg) for June

orig_june_model_updated_col_name.scenario_results['compliant_emissions'].sum()

In [None]:
orig_june_model_updated_col_name.get_percent_emissions_reduction_by_given_year(2040)

In [None]:
orig_june_model_updated_col_name.get_total_emissions_by_year()
orig_june_model_updated_col_name.emissions_by_year

In [None]:
# diff btwn jan and june
1 - (orig_jan_model_updated_col_name.scenario_results['compliant_emissions'].sum() / orig_june_model_updated_col_name.scenario_results['compliant_emissions'].sum())