# Gridded EPA Methane Inventory
## Category: 5D Wastewater Treatment

***
#### Authors: 
Erin E. McDuffie, Bram Maasakkers, Maggie Schultz
#### Date Last Updated: 
see Step 0
#### Notebook Purpose: 
This Notebook calculates and reports annual gridded (0.1°x0.1°) methane emission fluxes (molec./cm2/s) from Wastewater treatment facilities (total, industrial, and domestic) in the CONUS region between 2012-2018. 
#### Summary & Notes:
The national EPA GHGI emissions from domestic and industrial wastewater treatment facilities (split into sub-sources) are read in from the EPA GHG Inventory wastewater treatment workbook. Emissions are available as national totals (for entire time series). State-level allocations are also available from the 2021 State GHG Inventory for two industrial sectors (pulp and paper manufacturing and food and beverage manufacturing) within the industrial waste category. National industrial waste emissions are allocated to the state level (for each subgroup) using these relative state-level emissions data. State-level emissions for each subgroup are then allocated to the 0.01⁰x0.01⁰ CONUS grid using gridded data of facility-level emissions for each subgroup. Data are then re-gridded to 0.1⁰x0.1⁰. National MSW landfill emissions are allocated directly to the 0.1⁰x0.1⁰ CONUS grid using relative facility-level emissions for MSW landfills. All data are then converted to fluxes (molecules CH4/cm2/s). Annual emission fluxes (molecules CH4/cm2/s) for total landfills, MSW landfills, and industrial landfills are written to final netCDFs in the ‘/code/Final_Gridded_Data/’ folder. 
***

-------
## Step 0. Set-Up Notebook Modules, Functions, and Local Parameters and Constants
_____

In [None]:
#Confirm working directory & print last update time
import os
import time
modtime = os.path.getmtime('./5D_Wastewater.ipynb')
modificationTime = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(modtime))
print("This file was last modified on: ", modificationTime)
print('')
print("The directory we are working in is {}" .format(os.getcwd()))

In [None]:
## Include plots within notebook
%matplotlib inline

In [None]:
# Import base modules
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import re
import pyodbc
import PyPDF2 as pypdf
import tabula as tb
import shapefile as shp
from datetime import datetime
from copy import copy
from scipy.interpolate import interp1d
import geopy
from geopy.geocoders import Nominatim

# Import additional modules
from mpl_toolkits.basemap import Basemap

# Load netCDF (for manipulating netCDF file types)
from netCDF4 import Dataset

# Set up ticker
#import matplotlib.ticker as ticker

#add path for the global function module (file)
import sys
module_path = os.path.abspath(os.path.join('../Global_Functions/'))
if module_path not in sys.path:
    sys.path.append(module_path)

# Load functions
import data_load_functions as data_load_fn
import data_functions as data_fn
import data_IO_functions as data_IO_fn
import data_plot_functions as data_plot_fn

In [None]:
#INPUT Files
# Assign global file names
global_filenames = data_load_fn.load_global_file_names()
State_ANSI_inputfile = global_filenames[0]
#County_ANSI_inputfile = global_filenames[1]
pop_map_inputfile = global_filenames[2]
Grid_area01_inputfile = global_filenames[3]
Grid_area001_inputfile = global_filenames[4]
Grid_state001_ansi_inputfile = global_filenames[5]
#Grid_county001_ansi_inputfile = global_filenames[6]
globalinputlocation = global_filenames[0][0:20]
print(globalinputlocation)

# EPA Inventory Data
EPA_ww_inputfile = globalinputlocation+'GHGI/Ch7_Waste/WastewaterTreatment18_for PR_CRF_REVISED_02032020.xlsx'

#proxy mapping file
Wastewater_Mapping_inputfile = './InputData/WastewaterTreatment_ProxyMapping.xlsx'

# GHGRP Data
ghgrp_emi_ii_inputfile = './InputData/ghgrp_subpart_II.csv'
ghgrp_facility_ii_inputfile = './InputData/SubpartII_Facilities.csv'

#ECHO Data
echo_inputfile = './InputData/ECHO/ECHO_'

cwns_inputfile = './InputData/CWNS/Final Clean CWNS 2004.mdb'

#OUTPUT FILES
gridded_outputfile = '../Final_Gridded_Data/EPA_v2_5D_Wastewater_Treatment.nc'
gridded_dom_outputfile = '../Final_Gridded_Data/EPA_v2_5D_Wastewater_Treatment_Domestic.nc'
gridded_ind_outputfile = '../Final_Gridded_Data/EPA_v2_5D_Wastewater_Treatment_Industrial.nc'

netCDF_description = 'Gridded EPA Inventory - Wastewater Treatment Emissions - IPCC Source Category 5D'
netCDF_dom_description = 'Gridded EPA Inventory - Wastewater Treatment and Discharge Emissions - IPCC Source Category 5D - Domestic'
netCDF_ind_description = 'Gridded EPA Inventory - Wastewater Treatment and Discharge Emissions - IPCC Source Category 5D - Industrial'

title_str = "EPA methane emissions from wastewater treatment"
title_str_dom = "EPA methane emissions from domestic wastewater"
title_str_ind = "EPA methane emissions from industrial wastewater"
title_diff_str = "Emissions from wastewater difference: 2018-2012"
title_diff_str_dom = "Emissions from domestic wastewater treatment difference: 2018-2012"
title_diff_str_ind = "Emissions from industrial wastewater treatment difference: 2018-2012"

#output gridded proxy data
grid_emi_outputfile = '../Final_Gridded_Data/Extension/v2_input_data/Wastewater_Grid_Emi.nc'

In [None]:
# Define local variables
start_year = 2012  #First year in emission timeseries
end_year = 2018    #Last year in emission timeseries
year_range = [*range(start_year, end_year+1,1)] #List of emission years
year_range_str=[str(i) for i in year_range]
num_years = len(year_range)
num_inv_years = len([*range(1990, end_year+1,1)]) #List of inventory years

# Define constants
Avogadro   = 6.02214129 * 10**(23)  #molecules/mol
Molarch4   = 16.04                  #g/mol
Res01      = 0.1                    # degrees
Res_01     = 0.01                   # degrees
hrs_to_yrs = 8760                   #number of hours in a year
g_to_mt    = 1*10**(-6)             # grams to metric ton

# Continental US Lat/Lon Limits (for netCDF files)
Lon_left = -130       #deg
Lon_right = -60       #deg
Lat_low  = 20         #deg
Lat_up  = 55          #deg
loc_dimensions = [Lat_low, Lat_up, Lon_left, Lon_right]

ilat_start = int((90+Lat_low)/Res01) #1100:1450 (continental US range)
ilat_end = int((90+Lat_up)/Res01)
ilon_start = abs(int((-180-Lon_left)/Res01)) #500:1200 (continental US range)
ilon_end = abs(int((-180-Lon_right)/Res01))

# Number of days in each month
month_day_leap  = [  31,  29,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_day_nonleap = [  31,  28,  31,  30,  31,  30,  31,  31,  30,  31,  30,  31]
month_tag = ['01','02','03','04','05','06','07','08','09','10','11','12']
month_dict = {'January':1, 'February':2,'March':3,'April':4,'May':5,'June':6, 'July':7,'August':8,'September':9,'October':10,\
             'November':11,'December':12}

# Month arrays
month_range_str = ['January','February','March','April','May','June','July','August','September','October','November','December']
num_months = len(month_range_str)

In [None]:
%%javascript
IPython.OutputArea.auto_scroll_threshold = 9999;
//prevent auto-scrolling

In [None]:
# Track run time
ct = datetime.now() 
it = ct.timestamp() 
print("current time:", ct) 

____
## Step 1. Load in State ANSI data, and Area Maps
_____

In [None]:
# State-level ANSI Data
#Read the state ANSI file array
State_ANSI, name_dict, abbr_dict = data_load_fn.load_state_ansi(State_ANSI_inputfile)[0:3]
#QA: number of states
print('Read input file: '+ f"{State_ANSI_inputfile}")
print('Total "States" found: ' + '%.0f' % len(State_ANSI))
print(' ')

# 0.01 x0.01 degree Data
# State ANSI IDs and grid cell area (m2) maps
state_ANSI_map = data_load_fn.load_state_ansi_map(Grid_state001_ansi_inputfile)
area_map, lat001, lon001 = data_load_fn.load_area_map_001(Grid_area001_inputfile)

# 0.1 x0.1 degree data
# grid cell area and state ANSI maps
Lat01, Lon01 = data_load_fn.load_area_map_01(Grid_area01_inputfile)[1:3]
#Select relevant Continental 0.1 x0.1 domain
Lat_01 = Lat01[ilat_start:ilat_end]
Lon_01 = Lon01[ilon_start:ilon_end]
area_matrix_01 = data_fn.regrid001_to_01(area_map, Lat_01, Lon_01)
area_matrix_01 *= 10000  #convert from m2 to cm2
#state_ANSI_map_01 = data_fn.regrid001_to_01(state_ANSI_map, Lat_01, Lon_01)
#del area_map#, lat001, lon001, global_filenames

# Print time
ct = datetime.now() 
print("current time:", ct) 

-------------
## Step 2: Read-in and Format Proxy Data
-------------

### Step 2.1 Read In Proxy Mapping File & Make Proxy Arrays

In [None]:
#load GHGI Mapping Groups
names = pd.read_excel(Wastewater_Mapping_inputfile, sheet_name = "GHGI Map - WW", usecols = "A:B",skiprows = 1, header = 0)
colnames = names.columns.values
ghgi_wwt_map = pd.read_excel(Wastewater_Mapping_inputfile, sheet_name = "GHGI Map - WW", usecols = "A:B", skiprows = 1, names = colnames)
#drop rows with no data, remove the parentheses and ""
ghgi_wwt_map = ghgi_wwt_map[ghgi_wwt_map['GHGI_Emi_Group'] != 'na']
ghgi_wwt_map = ghgi_wwt_map[ghgi_wwt_map['GHGI_Emi_Group'].notna()]
ghgi_wwt_map = ghgi_wwt_map[ghgi_wwt_map['GHGI_Emi_Group'] != '-']
ghgi_wwt_map['GHGI_Source']= ghgi_wwt_map['GHGI_Source'].str.replace(r"\(","")
ghgi_wwt_map['GHGI_Source']= ghgi_wwt_map['GHGI_Source'].str.replace(r"\)","")
ghgi_wwt_map['GHGI_Source']= ghgi_wwt_map['GHGI_Source'].str.replace(r"+","")
ghgi_wwt_map.reset_index(inplace=True, drop=True)
display(ghgi_wwt_map)

#load emission group - proxy map
names = pd.read_excel(Wastewater_Mapping_inputfile, sheet_name = "Proxy Map - WW", usecols = "A:C",skiprows = 1, header = 0)
colnames = names.columns.values
proxy_wwt_map = pd.read_excel(Wastewater_Mapping_inputfile, sheet_name = "Proxy Map - WW", usecols = "A:C", skiprows = 1, names = colnames)
display((proxy_wwt_map))

#create empty proxy and emission group arrays (for state and months, where needed)
for igroup in np.arange(0,len(proxy_wwt_map)):
    if proxy_wwt_map.loc[igroup, 'Grid_Month_Flag'] ==0:
        vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
        vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years])
    else:
        vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years,num_months])
        vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']+'_nongrid'] = np.zeros([num_years,num_months])
        
    vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([num_years])  
        
emi_group_names = np.unique(ghgi_wwt_map['GHGI_Emi_Group'])

print('QA/QC: Is the number of emission groups the same for the proxy and emissions tabs?')
if (len(emi_group_names) == len(np.unique(proxy_wwt_map['GHGI_Emi_Group']))):
    print('PASS')
else:
    print('FAIL')
    print(emi_group_names)
    print(len(emi_group_names))
    print(len(np.unique(proxy_wwt_map['GHGI_Emi_Group'])))

### Step 2.2 Reads In Domestic Wastewater Treatment Proxy Data

### Option 1

#### Step 2.2.1. Read in Full ECHO Dataset

In [None]:
#Read in and combine ECHO data
for iyear in np.arange(0, num_years):
    print('Year:',year_range[iyear])
    for iregion in np.arange(1, 11): #10 regions of data
        if iregion ==1 and iyear == 0:
            echo_full = pd.read_csv(echo_inputfile+'Region'+str(iregion)+'_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
        elif iregion ==3:
            echo_temp = pd.read_csv(echo_inputfile+'DE_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'DC_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'MD_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'PA_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'VA_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'WV_POTW_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'WV_nonPOTW_GPC_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'WV_nonPOTW_NPD_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
        elif iregion ==4:
            echo_temp = pd.read_csv(echo_inputfile+'AL_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'FL_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'GA_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'KY_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'MS_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'NC_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'SC_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'TN_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
        elif iregion ==6:
            echo_temp = pd.read_csv(echo_inputfile+'AR_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'LA_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'NM_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'OK_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
            echo_temp = pd.read_csv(echo_inputfile+'TX_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
        elif iregion ==5 or iregion ==9:
            echo_temp1 = pd.read_csv(echo_inputfile+'Region'+str(iregion)+'_POTW_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_temp2 = pd.read_csv(echo_inputfile+'Region'+str(iregion)+'_nonPOTW_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp1)
            echo_full = echo_full.append(echo_temp2)
        else:
            echo_temp = pd.read_csv(echo_inputfile+'Region'+str(iregion)+'_'+year_range_str[iyear]+'.csv',skiprows = 3,low_memory = False)
            echo_full = echo_full.append(echo_temp)
        echo_full.reset_index(inplace=True, drop=True)
        #display(echo_full)
        print('Loaded Region',iregion,'...')
echo_full = echo_full[["Year","NPDES Permit Number","FRS ID","CWNS ID(s)","Facility Type Indicator","SIC Code","NAICS Code",\
                        "City","State",'County','Facility Latitude','Facility Longitude',\
                       'Average Daily Flow (MGD)','Actual Average Facility Flow (MGD)',\
                       'Total Facility Design Flow (MGD)']]

echo_full = echo_full.fillna(0)


display(echo_full)



In [None]:
# use reported daily flow rate. If not reported, use total design flow rate (if available and adjusted based on
# nationally-avaialble ratios of reported flow to capacity). Do for both POTW and non-POTW facilities
# Also Group data based on Permit Number (retain the maximum flow reported for the permit)
# also filter out water supply SIC codes

echo_potw = echo_full[(echo_full['Facility Type Indicator'] == 'POTW')& (echo_full['SIC Code'] != 4941.0)].copy()
echo_potw = echo_potw.groupby(['Year','NPDES Permit Number'], as_index = False).agg(\
               {'FRS ID':'first','CWNS ID(s)':'first','SIC Code':'max','NAICS Code':'max','City':'first',\
                'State':'first','County':'first','Facility Latitude':'max','Facility Longitude':'max',\
                'calc_flow_mgd': 'max','Total Facility Design Flow (MGD)': 'max','Average Daily Flow (MGD)': 'max',\
               'Actual Average Facility Flow (MGD)':'max'})
echo_potw.reset_index(inplace=True, drop=True)
#assume incorrect units if reported flows or design capacities are > 1000
echo_potw.loc[(echo_potw['Average Daily Flow (MGD)']  >= 1000) ,'Average Daily Flow (MGD)'] = \
    echo_potw.loc[(echo_potw['Average Daily Flow (MGD)']  >= 1000) ,'Average Daily Flow (MGD)']/1e6
#assume incorrect units if reported flows or design capacities are > 1000
echo_potw.loc[(echo_potw['Total Facility Design Flow (MGD)']  >= 1000) ,'Total Facility Design Flow (MGD)'] = \
    echo_potw.loc[(echo_potw['Total Facility Design Flow (MGD)']  >= 1000) ,'Total Facility Design Flow (MGD)']/1e6
echo_potw.loc[(echo_potw['Actual Average Facility Flow (MGD)']  >= 1000) ,'Actual Average Facility Flow (MGD)'] = \
    echo_potw.loc[(echo_potw['Actual Average Facility Flow (MGD)']  >= 1000) ,'Actual Average Facility Flow (MGD)']/1e6

#find median ratio of flow to capacity for all facilities that report both
flow_subset = echo_potw.copy()
#find where actual average flow is less than design flow and calculate national median ratio
flow_subset = flow_subset[(flow_subset['Actual Average Facility Flow (MGD)'] > 0) & (flow_subset['Total Facility Design Flow (MGD)'] > 0) &\
                         (flow_subset['Actual Average Facility Flow (MGD)'] < flow_subset['Total Facility Design Flow (MGD)'])]
flow_subset['ratio1'] = flow_subset['Actual Average Facility Flow (MGD)']/ flow_subset['Total Facility Design Flow (MGD)']
potw_ratio1 = np.mean(flow_subset['ratio1'])
#find where the actual and average daily flow rates are non-zero and calulate national median
flow_subset = echo_potw.copy()
flow_subset = flow_subset[(flow_subset['Actual Average Facility Flow (MGD)'] > 0) & (flow_subset['Average Daily Flow (MGD)'] > 0)]
flow_subset['ratio2'] = flow_subset['Actual Average Facility Flow (MGD)']/ flow_subset['Average Daily Flow (MGD)']
potw_ratio2 = np.median(flow_subset['ratio2'])

print(potw_ratio1, potw_ratio2)
#if no facility flow or daily flow is greater than design flow, replace with scaled design flow
echo_potw.loc[(echo_potw['calc_flow_mgd']  <= 0) | \
              (echo_potw['calc_flow_mgd'] > echo_potw['Total Facility Design Flow (MGD)']) ,'calc_flow_mgd'] = \
    echo_potw.loc[(echo_potw['calc_flow_mgd']  <= 0) | \
              (echo_potw['calc_flow_mgd'] > echo_potw['Total Facility Design Flow (MGD)']),'Total Facility Design Flow (MGD)']*potw_ratio1
# if no design flow, replace with scaled average daily flow
echo_potw.loc[(echo_potw['calc_flow_mgd']  <= 0) ,'calc_flow_mgd'] = \
    echo_potw.loc[(echo_potw['calc_flow_mgd']  <= 0) ,'Average Daily Flow (MGD)']*potw_ratio2


#for facilities with no reported flow data, use the reported capacity data, adjusted down by the national median ratio
count_temp = echo_potw[((echo_potw['Actual Average Facility Flow (MGD)'] == 0) & (echo_potw['Total Facility Design Flow (MGD)']==0))&\
                      (echo_potw['Average Daily Flow (MGD)']==0)]
#filter for all data <= 0 and report how many facilities those are
print(str(count_temp.shape[0]), ' of ', str(echo_potw.shape[0]) , ' POTW facilities have no flow/capacity data')
echo_potw = echo_potw[echo_potw['calc_flow_mgd'] >0]
echo_potw.reset_index(inplace=True, drop=True)

#do the same for non-POTW (filter out water supply facilities)
echo_nonpotw = echo_full[(echo_full['Facility Type Indicator'] == 'NON-POTW')& (echo_full['SIC Code'] != 4941.0)].copy()
echo_nonpotw = echo_nonpotw.groupby(['Year','NPDES Permit Number'], as_index = False).agg(\
               {'FRS ID':'first','CWNS ID(s)':'first','SIC Code':'max','NAICS Code':'max','City':'first',\
                'State':'first','County':'first','Facility Latitude':'max','Facility Longitude':'max',\
                'calc_flow_mgd': 'max','Total Facility Design Flow (MGD)': 'max','Average Daily Flow (MGD)': 'max',\
               'Actual Average Facility Flow (MGD)':'max'})
echo_nonpotw.reset_index(inplace=True, drop=True)

#find median ratio of flow to capacity for all facilities that report both
flow_subset = echo_nonpotw.copy()
flow_subset = flow_subset[(flow_subset['Actual Average Facility Flow (MGD)'] > 0) & (flow_subset['Total Facility Design Flow (MGD)'] > 0) &\
                         (flow_subset['Actual Average Facility Flow (MGD)'] < flow_subset['Total Facility Design Flow (MGD)'])]
flow_subset['ratio1'] = flow_subset['Actual Average Facility Flow (MGD)']/ flow_subset['Total Facility Design Flow (MGD)']
nonpotw_ratio1 = np.mean(flow_subset['ratio1'])
#find where the actual and average daily flow rates are non-zero and calulate national median
flow_subset = echo_nonpotw.copy()
flow_subset = flow_subset[(flow_subset['Actual Average Facility Flow (MGD)'] > 0) & (flow_subset['Average Daily Flow (MGD)'] > 0)]
flow_subset['ratio2'] = flow_subset['Actual Average Facility Flow (MGD)']/ flow_subset['Average Daily Flow (MGD)']
nonpotw_ratio2 = np.median(flow_subset['ratio2'])
#if no facility flow or daily flow is greater than design flow, replace with scaled design flow
echo_nonpotw.loc[(echo_nonpotw['calc_flow_mgd']  <= 0) | \
              (echo_nonpotw['calc_flow_mgd'] > echo_nonpotw['Total Facility Design Flow (MGD)']) ,'calc_flow_mgd'] = \
    echo_nonpotw.loc[(echo_nonpotw['calc_flow_mgd']  <= 0) | \
              (echo_nonpotw['calc_flow_mgd'] > echo_nonpotw['Total Facility Design Flow (MGD)']),'Total Facility Design Flow (MGD)']*nonpotw_ratio1
# if no design flow, replace with scaled average daily flow
echo_nonpotw.loc[(echo_nonpotw['calc_flow_mgd']  <= 0) ,'calc_flow_mgd'] = \
    echo_nonpotw.loc[(echo_nonpotw['calc_flow_mgd']  <= 0) ,'Average Daily Flow (MGD)']*nonpotw_ratio2
count_temp = echo_nonpotw[(echo_nonpotw['Actual Average Facility Flow (MGD)'] == 0) & (echo_nonpotw['Total Facility Design Flow (MGD)']==0)&\
                         (echo_nonpotw['Average Daily Flow (MGD)']==0)]
#filter for all data <= 0 and report how many facilities those are
print(str(count_temp.shape[0]), ' of ', str(echo_nonpotw.shape[0]) , ' non-POTW facilities have no flow/capacity data')
echo_nonpotw = echo_nonpotw[echo_nonpotw['calc_flow_mgd'] >0]

#### Step 2.2.2. Format POTW ECHO Data and Place onto CONUS Grid

In [None]:
# Place data (wastewater flow) on CONUS grid
map_dom_wwf = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_dom_wwf_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_temp = echo_potw[echo_potw['Year'] == year_range[iyear]]
    echo_temp.reset_index(inplace=True, drop=True)
    for ifacility in np.arange(0, len(echo_temp)):
        if echo_temp['Facility Longitude'][ifacility] > Lon_left and \
            echo_temp['Facility Longitude'][ifacility] < Lon_right and \
            echo_temp['Facility Latitude'][ifacility] > Lat_low and \
            echo_temp['Facility Latitude'][ifacility] < Lat_up:
                
            ilat = int((echo_temp['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_temp['Facility Longitude'][ifacility] - Lon_left)/Res01)
            map_dom_wwf[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]
        else:
            map_dom_wwf_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]
                
    print(np.sum(map_dom_wwf[:,:,iyear])+map_dom_wwf_nongrid[iyear])

### Option 2

#### Step 2.2.1. Read in CWNS 2004 Survey (to get classification types)

In [None]:
driver_str = r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ='+cwns_inputfile+';'''
conn = pyodbc.connect(driver_str)
cwns_facility_treatment = pd.read_sql("SELECT * FROM [Unit Processes]", conn)
conn.close()
#Select columns
cwns_facility_treatment = cwns_facility_treatment[["AF_NBR","UPB_NAME"]]  
cwns_facility_treatment['UPB_NAME']= cwns_facility_treatment['UPB_NAME'].str.replace(r"\(","")
cwns_facility_treatment['UPB_NAME']= cwns_facility_treatment['UPB_NAME'].str.replace(r"\)","")

# Determine classification (anaerobic/anaerobic digestor/aerobic)
# based on AerovsAnaero document provided by ERG (in Additional Resources document)
#constructed wetlands calssifications are from the 'CW Data' tab in the wastewater inventory workbook
cwns_facility_treatment['aerobic_flag'] = 0
cwns_facility_treatment['anaerobic_flag'] = 0
cwns_facility_treatment['anaerobic_digestor_flag'] = 0
cwns_facility_treatment['const_wetland_flag'] = 0

list_aerobic = ['Activated Bio-Filter ABF','Activated Sludge-Anaerobic/Anoxic/Oxic', 'Activated Sludge-Complete Mix',\
                'Activated Sludge-Contact Stabilization', 'Activated Sludge-Conventional', \
                'Activated Sludge-Extended Aeration', 'Activated Sludge-High Rate', 'Activated Sludge-Other Mode',\
                'Activated Sludge-Pure Oxygen', 'Activated Sludge-Step Aeration', \
                'Activated Sludge With Biological Denitrification', 'Aerated Biosolids Storage', 'Aerated Lagoon', \
                'Aeration System','Aerobic Digestion-Air', 'Aerobic Digestion-Oxygen', \
                'Autothermal Thermophilic Aerobic Digestion-Air', 'Autothermal Thermophilic Aerobic Digestion-Oxygn',\
                'Biological Nitrification-Separate Stage', 'Biological Phosphorus Removal', \
                'Biological Phosphorus Removal-Modified Bardenpho', 'Biological Phosphorus Removal-Phostrip', \
                'Combined Biological Nitrification And BOD Reductn', 'Oxidation Ditch', 'Package Plant', 'Post Aeration',\
                'Preaeration', 'Rapid Infiltration System-No Underdrain', 'Rapid Infiltration System W/Underdrain',\
                'Rotating Biological Contactor RBC', 'Sequencing Batch Reactor SBR', \
                'Slow Rate Application System-No Underdrain', 'Slow Rate Land Application System W/Underdrain', \
                'Trickling Filter', 'Trickling Filter-Other Media', 'Trickling Filter-Plastic Media', \
                'Trickling Filter-Redwood Slats','Trickling Filter-Rock Media']
list_anaerobic = ['Anaerobic Digestion', 'Anaerobic Digestion-Thermophilic', 'Anaerobic Lagoons', 'Biosolids Lagoons',\
                  'Constructed Wetlands', 'Constructed Wetlands Resource Extraction', \
                  'Denitrification Filter-Coarse Media', 'Design/Install Constructed Wetlands', 'Facultative Lagoon',\
                  'Freesurface/WetlandMarsh System', 'Lagoon, Polishing Lagoon', 'Stabilization Pond', \
                  'Waste Treatment Lagoon NO. 359']
list_constructed_wetlands = ['01000039001','01000054004','01000244001','01000353001','04001506001','05000130001',\
                             '05000722001','12000001039','16000163001','19000735001','20002005012','22001335001',\
                             '28000035001','28000155001','28000255001','28000660001','28000890001','28000915001',\
                             '28001005001','28001275001','28001275002','28001275003','28001315001','28001578002',\
                             '28001848001','29001003023','29002313001','29002418002','29002473001','36003149001',\
                             '39004809001','39009009001','40000371001','45000607105','46000015001','46000028001',\
                             '46000057001','46000060001','46000290001','46000291001','46000299001','46000368001',\
                             '46000404001','46000441001','46000448001','46000450001','46000458001','46000486001',\
                             '46000498001','46000555001','48006001001','51000222001','53000095001','55004430001',\
                             '56000093001','01000038001','01000064001','01000086001','01000248001','01000308001',\
                             '01000338001','01000348001','12000119001','12000139001','20000980001','36003204003']

for ifacility in np.arange(0, len(cwns_facility_treatment)):
    #Aerobic
    if cwns_facility_treatment.loc[ifacility,'UPB_NAME'] in list_aerobic:
        cwns_facility_treatment.loc[ifacility,'aerobic_flag'] = 1
    #Anerobic
    if cwns_facility_treatment.loc[ifacility,'UPB_NAME'] in list_anaerobic:
        #Anaerobic digestors
        if 'anaerobic digest' in cwns_facility_treatment.loc[ifacility,'UPB_NAME'].lower():
            cwns_facility_treatment.loc[ifacility,'anaerobic_digestor_flag'] = 1
        #anaerobic systems without digestors
        else:
            cwns_facility_treatment.loc[ifacility,'anaerobic_flag'] = 1
    #constructed wetlands
    if cwns_facility_treatment.loc[ifacility,'AF_NBR'] in list_constructed_wetlands:
        cwns_facility_treatment.loc[ifacility,'const_wetland_flag'] = 1

#Compress on AF_NBR, using max of flags
cwns_facility_treatment = cwns_facility_treatment.groupby(['AF_NBR'], as_index = False).agg(\
               {'aerobic_flag':'max','anaerobic_flag':'max','anaerobic_digestor_flag':'max','const_wetland_flag':'first'})
display(cwns_facility_treatment)  

#### Step 2.2.2. Match ECHO and CWNS datasets

In [None]:
#loop through ECHO Data - try to match by CWNS number and record classification type

echo_potw['aerobic_flag'] = 0
echo_potw['anaerobic_flag'] = 0
echo_potw['anaerobic_digestor_flag'] = 0
echo_potw['const_wetland_flag'] = 0
echo_potw['cwns_match'] = 0

for ifacility in np.arange(0, len(echo_potw)):
    imatch = np.where(echo_potw['CWNS ID(s)'][ifacility] == cwns_facility_treatment['AF_NBR'])[0]
    if len(imatch)>0:
        echo_potw.loc[ifacility,'aerobic_flag'] = cwns_facility_treatment.loc[imatch[0],'aerobic_flag']
        echo_potw.loc[ifacility,'anaerobic_flag'] = cwns_facility_treatment.loc[imatch[0],'anaerobic_flag']
        echo_potw.loc[ifacility,'anaerobic_digestor_flag'] = cwns_facility_treatment.loc[imatch[0],'anaerobic_digestor_flag']
        echo_potw.loc[ifacility,'const_wetland_flag'] = cwns_facility_treatment.loc[imatch[0],'const_wetland_flag']
        echo_potw.loc[ifacility,'cwns_match'] = 1
display(echo_potw)

#### Step 2.2.3. Place wastewater volumes onto CONUS grid for each classification type

In [None]:
#Place wastewater flow onto grid for each type
# for facilities that weren't matched, assign fraction of flow to each classification based on the ratio of flows 
# for each classification type each year (so a facility will be considered partially under multiple categories)

# This approach assigns values to each facility type that it's classified as. For example, 
# if a facility is classified as aeroboc and anaeroboc, the wasterwater flow
# is mapped to both the aerobic and anaerobic proxies. This means that the percentages between
# the four categories won't sum to 100% and will double or triple count facilities that have
# mutliple classification types

# Place data (wastewater flow) on CONUS grid
map_dom_aero = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_dom_aero_nongrid = np.zeros([num_years])
map_dom_anaero = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_dom_anaero_nongrid = np.zeros([num_years])
map_dom_ad = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_dom_ad_nongrid = np.zeros([num_years])
map_dom_cw = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_dom_cw_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_temp = echo_potw[echo_potw['Year'] == year_range[iyear]]
    echo_temp.reset_index(inplace=True, drop=True)
    total_flow = np.sum(echo_temp.loc[(echo_potw['cwns_match']==1) & ((echo_potw['aerobic_flag']==1) | \
                                      (echo_potw['anaerobic_flag']==1) |(echo_potw['anaerobic_digestor_flag']==1)),\
                                      'calc_flow_mgd'])#where matched, aerobic+anaerobic+ad
    per_aero = np.sum(echo_temp.loc[(echo_potw['cwns_match']==1) & (echo_potw['aerobic_flag']==1),\
                                      'calc_flow_mgd'])/total_flow#where matched, aerobic+anaerobic+ad
    #where matched, aero/total_flow
    per_anaero = np.sum(echo_temp.loc[(echo_potw['cwns_match']==1) & (echo_potw['anaerobic_flag']==1) &\
                                      (echo_potw['anaerobic_digestor_flag']==0), 'calc_flow_mgd'])/total_flow#where matched, anaero/total_flow
    per_ad = np.sum(echo_temp.loc[(echo_potw['cwns_match']==1) & (echo_potw['anaerobic_digestor_flag']==1),\
                                      'calc_flow_mgd'])/total_flow
    per_cw = np.sum(echo_temp.loc[(echo_potw['cwns_match']==1) & (echo_potw['const_wetland_flag']==1),\
                                      'calc_flow_mgd'])/total_flow#where matched, cw/sum where not cw
    print('Year:',year_range[iyear])
    print('Percent aerobic flow:',per_aero)
    print('Percent anaerobic flow:',per_anaero)
    print('Percent AD flow:',per_ad)
    print('Percent CW flow:',per_cw)
    print(' ')
    for ifacility in np.arange(0, len(echo_temp)):
        if echo_temp['Facility Longitude'][ifacility] > Lon_left and \
            echo_temp['Facility Longitude'][ifacility] < Lon_right and \
            echo_temp['Facility Latitude'][ifacility] > Lat_low and \
            echo_temp['Facility Latitude'][ifacility] < Lat_up:

            ilat = int((echo_temp['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_temp['Facility Longitude'][ifacility] - Lon_left)/Res01)
            if echo_temp['cwns_match'][ifacility] == 0:
                map_dom_aero[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]*per_aero
                map_dom_anaero[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]*per_anaero
                map_dom_ad[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]*per_ad
            if echo_temp['aerobic_flag'][ifacility] ==1:
                map_dom_aero[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]
            if echo_temp['anaerobic_flag'][ifacility] ==1 and echo_temp['anaerobic_digestor_flag'][ifacility] ==0:
                map_dom_anaero[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]
            if echo_temp['anaerobic_digestor_flag'][ifacility] ==1:
                map_dom_ad[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]
            if echo_temp['const_wetland_flag'][ifacility] ==1:
                map_dom_cw[ilat,ilon,iyear] += echo_temp['calc_flow_mgd'][ifacility]
        else:
            if echo_temp['cwns_match'][ifacility] == 0:
                map_dom_aero_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]*per_aero
                map_dom_anaero_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]*per_anaero
                map_dom_ad_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]*per_ad
            if echo_temp['aerobic_flag'][ifacility] ==1:
                map_dom_aero_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]
            if echo_temp['anaerobic_flag'][ifacility] ==1 and echo_temp['anaerobic_digestor_flag'][ifacility] ==0:
                map_dom_anaero_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]
            if echo_temp['anaerobic_digestor_flag'][ifacility] ==1:
                map_dom_ad_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]
            if echo_temp['const_wetland_flag'][ifacility] ==1:
                map_dom_cw_nongrid[iyear] += echo_temp['calc_flow_mgd'][ifacility]

#### Step 2.2.4. Population

In [None]:
#Read population density map
pop_den_map = data_load_fn.load_pop_den_map(pop_map_inputfile)

# convert to absolute population and re-grid to 0.1x0.1 degrees (hold population constant over all years)
map_pop_001 = pop_den_map * area_map
map_pop_01 = data_fn.regrid001_to_01(map_pop_001, Lat_01, Lon_01)

map_pop = np.zeros([len(Lat_01),len(Lon_01),num_years])

for iyear in np.arange(0, num_years):
    map_pop[:,:,iyear] = map_pop_01

### Step 2.3 Reads In Industrial Wastewater Treatment Proxy Data

#### Step 2.3.1 Read In GHGRP Subpart II Data

In [None]:
#Read in GHGRP Subpart II Emissions #and place onto CONUS grid

#a) Read in the GHGRP facility data (Subpart II)
facility_info = pd.read_csv(ghgrp_facility_ii_inputfile)
facility_emis = pd.read_csv(ghgrp_emi_ii_inputfile)
#filter emissions data for methane only (in metric tonnes CH4) and for years of interest
facility_emis = facility_emis[facility_emis['II_SUBPART_LEVEL_INFORMATION.GHG_NAME'] == 'Methane']
facility_emis = facility_emis[facility_emis['II_SUBPART_LEVEL_INFORMATION.REPORTING_YEAR'].isin(year_range)]
facility_info = facility_info[facility_info['V_GHG_EMITTER_FACILITIES.YEAR'].isin(year_range)]
facility_info.reset_index(inplace=True, drop=True)
facility_emis.reset_index(inplace=True, drop=True)

#rename common columns and merge into one dataframe
facility_info.rename(columns={'V_GHG_EMITTER_FACILITIES.YEAR':'Year', \
                             'V_GHG_EMITTER_FACILITIES.FACILITY_ID':'Facility_ID', \
                             'V_GHG_EMITTER_FACILITIES.LONGITUDE':'LONGITUDE',
                             'V_GHG_EMITTER_FACILITIES.LATITUDE':'LATITUDE',
                             'V_GHG_EMITTER_FACILITIES.PRIMARY_NAICS_CODE':'NAICS_CODE',
                             'V_GHG_EMITTER_FACILITIES.COUNTY':'COUNTY',
                             'V_GHG_EMITTER_FACILITIES.CITY':'CITY',
                             'V_GHG_EMITTER_FACILITIES.STATE':'STATE'},inplace=True)
facility_emis.rename(columns={'II_SUBPART_LEVEL_INFORMATION.REPORTING_YEAR':'Year', \
                              'II_SUBPART_LEVEL_INFORMATION.FACILITY_ID':'Facility_ID'},inplace=True)
ghgrp_ind = pd.merge(facility_info, facility_emis)
ghgrp_ind['emis_kt_tot'] = ghgrp_ind['II_SUBPART_LEVEL_INFORMATION.GHG_QUANTITY']/1e3 #convert metric tonnes to kt

print('NAICES CODES in Subpart II:', np.unique(ghgrp_ind['NAICS_CODE']))
#split into different industry sectors (to later match with ECHO plant data)
ghgrp_ind['NAICS_CODE'] = ghgrp_ind['NAICS_CODE'].astype(str)

#pulp and paper
ghgrp_pp = ghgrp_ind[ghgrp_ind['NAICS_CODE'].str.startswith('322')].copy()
ghgrp_pp.reset_index(inplace=True, drop=True)
#red meat and poultry
ghgrp_mp = ghgrp_ind[ghgrp_ind['NAICS_CODE'].str.startswith('3116')].copy()
ghgrp_mp.reset_index(inplace=True, drop=True)
#fruits and veggies
ghgrp_fv = ghgrp_ind[ghgrp_ind['NAICS_CODE'].str.startswith('3114')].copy()
ghgrp_fv.reset_index(inplace=True, drop=True)
#ethanol production
ghgrp_eth = ghgrp_ind[ghgrp_ind['NAICS_CODE'].str.startswith('325193')].copy()
ghgrp_eth.reset_index(inplace=True, drop=True)
##CORRECT REPORTING ERROR IN 2014 & negative reported value###
ghgrp_eth.loc[(ghgrp_eth['Year']==2014) & (ghgrp_eth['V_GHG_EMITTER_FACILITIES.FACILITY_NAME']=='Hub City Energy'),'emis_kt_tot'] /= 100
ghgrp_eth.loc[:,'emis_kt_tot'] = abs(ghgrp_eth.loc[:,'emis_kt_tot'])
###
#breweries
ghgrp_brew = ghgrp_ind[ghgrp_ind['NAICS_CODE'].str.startswith('312120')].copy()
ghgrp_brew.reset_index(inplace=True, drop=True)
#petroleum refining
ghgrp_ref = ghgrp_ind[ghgrp_ind['NAICS_CODE'].str.startswith('324121')].copy()
ghgrp_ref.reset_index(inplace=True, drop=True)
if len(ghgrp_ref) ==0: #if no data, create blank dataframe
    df2 = {'Year':2012}
    ghgrp_ref = ghgrp_ref.append(df2, ignore_index = True)
    ghgrp_ref = ghgrp_ref.fillna(0)
    
print(' ')
print('GHGRP Emissions (kt)')
for iyear in np.arange(0, num_years):
    print('Year',year_range[iyear],':')
    print('Pulp & Paper   :',np.sum(ghgrp_pp.loc[ghgrp_pp['Year']==year_range[iyear],'emis_kt_tot']))
    print('Meat & Poultry :',np.sum(ghgrp_mp.loc[ghgrp_mp['Year']==year_range[iyear],'emis_kt_tot']))
    print('Fruit & Veggies:',np.sum(ghgrp_fv.loc[ghgrp_fv['Year']==year_range[iyear],'emis_kt_tot']))
    print('Ethanol        :',np.sum(ghgrp_eth.loc[ghgrp_eth['Year']==year_range[iyear],'emis_kt_tot']))
    print('Breweries      :',np.sum(ghgrp_brew.loc[ghgrp_brew['Year']==year_range[iyear],'emis_kt_tot']))
    print('Refining (kt)  :',np.sum(ghgrp_ref.loc[ghgrp_ref['Year']==year_range[iyear],'emis_kt_tot']))
    print(' ')

#### Step 2.3.2. Read in EPA Industrial Emissions

In [None]:
#1 read in EPA industr emissions
names = pd.read_excel(EPA_ww_inputfile, sheet_name = "InvDB", usecols = "E:AJ",skiprows = 15, header = 0)
colnames = names.columns.values
EPA_emi_ind_ww_CH4 = pd.read_excel(EPA_ww_inputfile, sheet_name = "InvDB", usecols = "E:AJ", skiprows = 15, names = colnames)
#drop rows with no data, remove the parentheses and ""
EPA_emi_ind_ww_CH4 = EPA_emi_ind_ww_CH4.fillna('')
EPA_emi_ind_ww_CH4.rename(columns={EPA_emi_ind_ww_CH4.columns[0]:'Source'}, inplace=True)
EPA_emi_ind_ww_CH4['Source']= EPA_emi_ind_ww_CH4['Source'].str.replace(r"\(","")
EPA_emi_ind_ww_CH4['Source']= EPA_emi_ind_ww_CH4['Source'].str.replace(r"\)","")
EPA_emi_ind_ww_CH4.drop(EPA_emi_ind_ww_CH4.columns[[1,2]], axis =1, inplace= True)
EPA_emi_ind_ww_CH4 = EPA_emi_ind_ww_CH4.drop(columns = [n for n in range(1990, start_year,1)])
EPA_emi_ind_ww_CH4.iloc[:,1:] = EPA_emi_ind_ww_CH4.iloc[:,1:]*1000 #convert Tg to kt
display(EPA_emi_ind_ww_CH4)

#### Step 2.3.3. Format ECHO data by industry segment 

##### Step 2.3.3.1. Format ECHO Non-POTW Data

In [None]:
echo_nonpotw['NAICS Code'] = echo_nonpotw['NAICS Code'].astype(str)
echo_nonpotw['SIC Code'] = echo_nonpotw['SIC Code'].astype(str)

# extract industry specific facilities from non-POTW list
# SIC codes determined by cross-walking with relevant NAICS codes
#pulp and paper
echo_pp = echo_nonpotw[(echo_nonpotw['NAICS Code'].str.startswith('3221'))\
                       |(echo_nonpotw['SIC Code'].str.contains('|'.join(['2611','2621','2631'])))].copy()
echo_pp.reset_index(inplace=True, drop=True)

#meat and poultry
echo_mp = echo_nonpotw[(echo_nonpotw['NAICS Code'].str.startswith('3116'))\
                       |(echo_nonpotw['SIC Code'].str.contains('|'.join(['0751','2011','2048','2013','5147','2077','2015'])))].copy()
echo_mp.reset_index(inplace=True, drop=True)

#fruit and veggies
echo_fv = echo_nonpotw[(echo_nonpotw['NAICS Code'].str.startswith('3114'))\
                       |(echo_nonpotw['SIC Code'].str.contains('|'.join(['2037','2038','2033','2035','2032','2034','2099'])))].copy()
echo_fv.reset_index(inplace=True, drop=True)

#ethanol
echo_eth = echo_nonpotw[(echo_nonpotw['NAICS Code'].str.startswith('325193'))\
                       |(echo_nonpotw['SIC Code'].str.contains('|'.join(['2869'])))].copy()
echo_eth.reset_index(inplace=True, drop=True)

#breweries
echo_brew = echo_nonpotw[(echo_nonpotw['NAICS Code'].str.startswith('312120'))\
                       |(echo_nonpotw['SIC Code'].str.contains('|'.join(['2082'])))].copy()
echo_brew.reset_index(inplace=True, drop=True)

#petroleum refining
echo_ref = echo_nonpotw[(echo_nonpotw['NAICS Code'].str.startswith('32411'))\
                       |(echo_nonpotw['SIC Code'].str.contains('|'.join(['2911'])))].copy()
echo_ref.reset_index(inplace=True, drop=True)


##### Step 2.3.3.2. Pulp and Paper

In [None]:
# Step 1 -  try to match facilities to ghgrp based on location 
# If there is no match with a GHGRP facility, add it to the full list of facilities

echo_pp['ghgrp_match'] = 0
echo_pp['emis_kt'] = 0
echo_pp['EF'] = 0
ghgrp_pp.loc[:,'found']=0

for ifacility in np.arange(0,len(echo_pp)):
    for ighgrp in np.arange(0,len(ghgrp_pp)):
        dist = np.sqrt((ghgrp_pp.loc[ighgrp,'LATITUDE']-echo_pp.loc[ifacility,'Facility Latitude'])**2\
                       +(ghgrp_pp.loc[ighgrp,'LONGITUDE']-echo_pp.loc[ifacility,'Facility Longitude'])**2)
        if dist < 0.025 and ghgrp_pp.loc[ighgrp,'Year'] == echo_pp.loc[ifacility,'Year']:
            ghgrp_pp.loc[ighgrp,'found'] = 1
            echo_pp.loc[ifacility,'ghgrp_match'] = 1
            echo_pp.loc[ifacility,'emis_kt'] = ghgrp_pp.loc[ighgrp,'emis_kt_tot']
        else:
            continue

print('Found (%):',100*np.sum(echo_pp['ghgrp_match']/len(echo_pp)))
print('GHGRP not found:',len(ghgrp_pp[ghgrp_pp['found']==0]))
print('GHGRP found:',len(ghgrp_pp[ghgrp_pp['found']==1]))
print('Total Emis (kt):',np.sum(echo_pp['emis_kt']))

# Step 2 - add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_pp)):
    if ghgrp_pp.loc[ifacility,'found'] ==0:
        facility_id = ghgrp_pp.loc[ifacility,'Facility_ID']
        df2 = {'Year':ghgrp_pp.loc[ifacility,'Year'],'State':ghgrp_pp.loc[ifacility,'STATE'], 'City':ghgrp_pp.loc[ifacility,'CITY'],\
               'County':ghgrp_pp.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Facility Latitude': ghgrp_pp.loc[ifacility,'LATITUDE'], \
               'Facility Longitude':ghgrp_pp.loc[ifacility,'LONGITUDE'], 'EF':0,\
               'calc_flow_mgd':0,'emis_kt':ghgrp_pp.loc[ifacility,'emis_kt_tot']}
        echo_pp = echo_pp.append(df2, ignore_index = True)
        ghgrp_pp.loc[ifacility,'found'] =1

#Step 3 -  distribute remaining emissions difference based on the relative wastewater flow at each facility
for iyear in np.arange(0, num_years):
    epa_emi  = EPA_emi_ind_ww_CH4.loc[EPA_emi_ind_ww_CH4['Source'] == 'Pulp and Paper',year_range[iyear]].values[0]
    #print(epa_emi)
    sum_emi = np.sum(echo_pp.loc[echo_pp['Year']==year_range[iyear],'emis_kt'])
    emi_diff = max((epa_emi-sum_emi),0)
    print(emi_diff)
    total_flow = np.sum(echo_pp.loc[(echo_pp['Year']==year_range[iyear]) & (echo_pp['ghgrp_match']==0),'calc_flow_mgd'])
    
    for ifacility in np.arange(0, len(echo_pp)):
        if echo_pp.loc[ifacility,'Year']==year_range[iyear] and echo_pp.loc[ifacility,'ghgrp_match']==0:
            echo_pp.loc[ifacility,'emis_kt'] = emi_diff * echo_pp.loc[ifacility,'calc_flow_mgd']/total_flow
    
# Step 4- Place data on CONUS grid
map_ind_pp_emis = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_ind_pp_emis_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_temp = echo_pp[echo_pp['Year'] == year_range[iyear]]
    echo_temp.reset_index(inplace=True, drop=True)
    #display(ghgrp_temp)
    for ifacility in np.arange(0, len(echo_temp)):
        if echo_temp['Facility Longitude'][ifacility] > Lon_left and \
            echo_temp['Facility Longitude'][ifacility] < Lon_right and \
            echo_temp['Facility Latitude'][ifacility] > Lat_low and \
            echo_temp['Facility Latitude'][ifacility] < Lat_up:

            ilat = int((echo_temp['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_temp['Facility Longitude'][ifacility] - Lon_left)/Res01)
            map_ind_pp_emis[ilat,ilon,iyear] += echo_temp['emis_kt'][ifacility]
        else:
            map_ind_pp_emis_nongrid[iyear] += echo_temp['emis_kt'][ifacility]
    print('Year',year_range[iyear],'Emissions (kt):',np.sum(map_ind_pp_emis[:,:,iyear])+map_ind_pp_emis_nongrid[iyear])

##### Step 2.3.3.3.  Meat and Poultry

In [None]:
#) Step 1 -  try to match facilities to ghgrp based on location 
# If there is no match with a GHGRP facility, add it to the full list of facilities
echo_mp['ghgrp_match'] = 0
echo_mp['emis_kt'] = 0
echo_mp['EF'] = 0
ghgrp_mp.loc[:,'found']=0

for ifacility in np.arange(0,len(echo_mp)):
    for ighgrp in np.arange(0,len(ghgrp_mp)):
        dist = np.sqrt((ghgrp_mp.loc[ighgrp,'LATITUDE']-echo_mp.loc[ifacility,'Facility Latitude'])**2\
                       +(ghgrp_mp.loc[ighgrp,'LONGITUDE']-echo_mp.loc[ifacility,'Facility Longitude'])**2)
        if dist < 0.025 and ghgrp_mp.loc[ighgrp,'Year'] == echo_mp.loc[ifacility,'Year']:
            ghgrp_mp.loc[ighgrp,'found'] = 1
            echo_mp.loc[ifacility,'ghgrp_match'] = 1
            echo_mp.loc[ifacility,'emis_kt'] = ghgrp_mp.loc[ighgrp,'emis_kt_tot']
        else:
            continue
print('Found (%):',100*np.sum(echo_mp['ghgrp_match']/len(echo_mp)))
print('GHGRP not found:',len(ghgrp_mp[ghgrp_mp['found']==0]))
print('GHGRP found:',len(ghgrp_mp[ghgrp_mp['found']==1]))
print('Total Emis (kt):',np.sum(echo_mp['emis_kt']))


# Step 2 - add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_mp)):
    if ghgrp_mp.loc[ifacility,'found'] ==0:
        facility_id = ghgrp_mp.loc[ifacility,'Facility_ID']
        df2 = {'Year':ghgrp_mp.loc[ifacility,'Year'],'State':ghgrp_mp.loc[ifacility,'STATE'], 'City':ghgrp_mp.loc[ifacility,'CITY'],\
               'County':ghgrp_mp.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Facility Latitude': ghgrp_mp.loc[ifacility,'LATITUDE'], \
               'Facility Longitude':ghgrp_mp.loc[ifacility,'LONGITUDE'], 'EF':0,\
               'calc_flow_mgd':0,'emis_kt':ghgrp_mp.loc[ifacility,'emis_kt_tot']}
        echo_mp = echo_mp.append(df2, ignore_index = True)
        ghgrp_mp.loc[ifacility,'found'] =1

        
#Step 3 -  distribute remaining emissions difference based on the relative wastewater flow at each facility
for iyear in np.arange(0, num_years):
    epa_emi  = EPA_emi_ind_ww_CH4.loc[EPA_emi_ind_ww_CH4['Source'] == 'Red Meat and Poultry',year_range[iyear]].values[0]
    sum_emi = np.sum(echo_mp.loc[echo_mp['Year']==year_range[iyear],'emis_kt'])
    emi_diff = max((epa_emi-sum_emi),0)
    print(emi_diff)
    total_flow = np.sum(echo_mp.loc[(echo_mp['Year']==year_range[iyear]) & (echo_mp['ghgrp_match']==0),'calc_flow_mgd'])
    for ifacility in np.arange(0, len(echo_mp)):
        if echo_mp.loc[ifacility,'Year']==year_range[iyear] and echo_mp.loc[ifacility,'ghgrp_match']==0:
            echo_mp.loc[ifacility,'emis_kt'] = emi_diff * echo_mp.loc[ifacility,'calc_flow_mgd']/total_flow
    

# Step 4- Place data on CONUS grid
map_ind_mp_emis = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_ind_mp_emis_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_temp = echo_mp[echo_mp['Year'] == year_range[iyear]]
    echo_temp.reset_index(inplace=True, drop=True)
    for ifacility in np.arange(0, len(echo_temp)):
        if echo_temp['Facility Longitude'][ifacility] > Lon_left and \
            echo_temp['Facility Longitude'][ifacility] < Lon_right and \
            echo_temp['Facility Latitude'][ifacility] > Lat_low and \
            echo_temp['Facility Latitude'][ifacility] < Lat_up:

            ilat = int((echo_temp['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_temp['Facility Longitude'][ifacility] - Lon_left)/Res01)
            map_ind_mp_emis[ilat,ilon,iyear] += echo_temp['emis_kt'][ifacility]
        else:
            map_ind_mp_emis_nongrid[iyear] += echo_temp['emis_kt'][ifacility]
    print('Year',year_range[iyear],'Emissions (kt):',np.sum(map_ind_mp_emis[:,:,iyear])+map_ind_mp_emis_nongrid[iyear])

##### Step 2.3.3.4.  Fruit and Vegtables

In [None]:
#) Step 1 -  try to match facilities to ghgrp based on location 
# If there is no match with a GHGRP facility, add it to the full list of facilities
echo_fv['ghgrp_match'] = 0
echo_fv['emis_kt'] = 0
echo_fv['EF'] = 0
ghgrp_fv.loc[:,'found']=0

for ifacility in np.arange(0,len(echo_fv)):
    for ighgrp in np.arange(0,len(ghgrp_fv)):
        dist = np.sqrt((ghgrp_fv.loc[ighgrp,'LATITUDE']-echo_fv.loc[ifacility,'Facility Latitude'])**2\
                       +(ghgrp_fv.loc[ighgrp,'LONGITUDE']-echo_fv.loc[ifacility,'Facility Longitude'])**2)
        if dist < 0.025 and ghgrp_fv.loc[ighgrp,'Year'] == echo_fv.loc[ifacility,'Year']:
            ghgrp_fv.loc[ighgrp,'found'] = 1
            echo_fv.loc[ifacility,'ghgrp_match'] = 1
            echo_fv.loc[ifacility,'emis_kt'] = ghgrp_fv.loc[ighgrp,'emis_kt_tot']
        else:
            continue
print('Found (%):',100*np.sum(echo_fv['ghgrp_match']/len(echo_fv)))
print('GHGRP not found:',len(ghgrp_fv[ghgrp_fv['found']==0]))
print('GHGRP found:',len(ghgrp_fv[ghgrp_fv['found']==1]))
print('Total Emis (kt):',np.sum(echo_fv['emis_kt']))


# Step 2 - add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_fv)):
    if ghgrp_fv.loc[ifacility,'found'] ==0:
        facility_id = ghgrp_fv.loc[ifacility,'Facility_ID']
        df2 = {'Year':ghgrp_fv.loc[ifacility,'Year'],'State':ghgrp_fv.loc[ifacility,'STATE'], 'City':ghgrp_fv.loc[ifacility,'CITY'],\
               'County':ghgrp_fv.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Facility Latitude': ghgrp_fv.loc[ifacility,'LATITUDE'], \
               'Facility Longitude':ghgrp_fv.loc[ifacility,'LONGITUDE'], 'EF':0,\
               'calc_flow_mgd':0,'emis_kt':ghgrp_fv.loc[ifacility,'emis_kt_tot']}
        echo_fv = echo_fv.append(df2, ignore_index = True)
        ghgrp_fv.loc[ifacility,'found'] =1

        
#Step 3 -  distribute remaining emissions difference based on the relative wastewater flow at each facility
for iyear in np.arange(0, num_years):
    epa_emi  = EPA_emi_ind_ww_CH4.loc[EPA_emi_ind_ww_CH4['Source'] == 'Fruits and Vegtables',year_range[iyear]].values[0]
    sum_emi = np.sum(echo_fv.loc[echo_fv['Year']==year_range[iyear],'emis_kt'])
    emi_diff = max((epa_emi-sum_emi),0)
    print(emi_diff)
    total_flow = np.sum(echo_fv.loc[(echo_fv['Year']==year_range[iyear]) & (echo_fv['ghgrp_match']==0),'calc_flow_mgd'])
    for ifacility in np.arange(0, len(echo_fv)):
        if echo_fv.loc[ifacility,'Year']==year_range[iyear] and echo_fv.loc[ifacility,'ghgrp_match']==0:
            echo_fv.loc[ifacility,'emis_kt'] = emi_diff * echo_fv.loc[ifacility,'calc_flow_mgd']/total_flow
    

# Step 4- Place data on CONUS grid
map_ind_fv_emis = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_ind_fv_emis_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_tefv = echo_fv[echo_fv['Year'] == year_range[iyear]]
    echo_tefv.reset_index(inplace=True, drop=True)
    for ifacility in np.arange(0, len(echo_tefv)):
        if echo_tefv['Facility Longitude'][ifacility] > Lon_left and \
            echo_tefv['Facility Longitude'][ifacility] < Lon_right and \
            echo_tefv['Facility Latitude'][ifacility] > Lat_low and \
            echo_tefv['Facility Latitude'][ifacility] < Lat_up:

            ilat = int((echo_tefv['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_tefv['Facility Longitude'][ifacility] - Lon_left)/Res01)
            map_ind_fv_emis[ilat,ilon,iyear] += echo_tefv['emis_kt'][ifacility]
        else:
            map_ind_fv_emis_nongrid[iyear] += echo_tefv['emis_kt'][ifacility]
    print('Year',year_range[iyear],'Emissions (kt):',np.sum(map_ind_fv_emis[:,:,iyear])+map_ind_fv_emis_nongrid[iyear])

##### Step 2.3.3.5. Ethanol Prodution

In [None]:
#) Step 1 -  try to match facilities to ghgrp based on location 
# If there is no match with a GHGRP facility, add it to the full list of facilities
echo_eth['ghgrp_match'] = 0
echo_eth['emis_kt'] = 0
echo_eth['EF'] = 0
ghgrp_eth.loc[:,'found']=0

for ifacility in np.arange(0,len(echo_eth)):
    for ighgrp in np.arange(0,len(ghgrp_eth)):
        dist = np.sqrt((ghgrp_eth.loc[ighgrp,'LATITUDE']-echo_eth.loc[ifacility,'Facility Latitude'])**2\
                       +(ghgrp_eth.loc[ighgrp,'LONGITUDE']-echo_eth.loc[ifacility,'Facility Longitude'])**2)
        if dist < 0.025 and ghgrp_eth.loc[ighgrp,'Year'] == echo_eth.loc[ifacility,'Year']:
            ghgrp_eth.loc[ighgrp,'found'] = 1
            echo_eth.loc[ifacility,'ghgrp_match'] = 1
            echo_eth.loc[ifacility,'emis_kt'] = abs(ghgrp_eth.loc[ighgrp,'emis_kt_tot']) #some ghgrp data were negative
        else:
            continue
print('Found (%):',100*np.sum(echo_eth['ghgrp_match']/len(echo_eth)))
print('GHGRP not found:',len(ghgrp_eth[ghgrp_eth['found']==0]))
print('GHGRP found:',len(ghgrp_eth[ghgrp_eth['found']==1]))
print('Total Emis (kt):',np.sum(echo_eth['emis_kt']))



# Step 2 - add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_eth)):
    if ghgrp_eth.loc[ifacility,'found'] ==0:
        facility_id = ghgrp_eth.loc[ifacility,'Facility_ID']
        df2 = {'Year':ghgrp_eth.loc[ifacility,'Year'],'State':ghgrp_eth.loc[ifacility,'STATE'], 'City':ghgrp_eth.loc[ifacility,'CITY'],\
               'County':ghgrp_eth.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Facility Latitude': ghgrp_eth.loc[ifacility,'LATITUDE'], \
               'Facility Longitude':ghgrp_eth.loc[ifacility,'LONGITUDE'], 'EF':0,\
               'calc_flow_mgd':0,'emis_kt':ghgrp_eth.loc[ifacility,'emis_kt_tot']}
        echo_eth = echo_eth.append(df2, ignore_index = True)
        ghgrp_eth.loc[ifacility,'found'] =1

        
#Step 3 -  distribute remaining emissions difference based on the relative wastewater flow at each facility
for iyear in np.arange(0, num_years):
    epa_emi  = EPA_emi_ind_ww_CH4.loc[EPA_emi_ind_ww_CH4['Source'] == 'Ethanol Production',year_range[iyear]].values[0]
    sum_emi = np.sum(echo_eth.loc[echo_eth['Year']==year_range[iyear],'emis_kt'])
    emi_diff = max((epa_emi-sum_emi),0)
    print(emi_diff)
    total_flow = np.sum(echo_eth.loc[(echo_eth['Year']==year_range[iyear]) & (echo_eth['ghgrp_match']==0),'calc_flow_mgd'])
    for ifacility in np.arange(0, len(echo_eth)):
        if echo_eth.loc[ifacility,'Year']==year_range[iyear] and echo_eth.loc[ifacility,'ghgrp_match']==0:
            echo_eth.loc[ifacility,'emis_kt'] = emi_diff * echo_eth.loc[ifacility,'calc_flow_mgd']/total_flow

# Step 4- Place data on CONUS grid
map_ind_eth_emis = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_ind_eth_emis_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_temp = echo_eth[echo_eth['Year'] == year_range[iyear]]
    echo_temp.reset_index(inplace=True, drop=True)
    for ifacility in np.arange(0, len(echo_temp)):
        if echo_temp['Facility Longitude'][ifacility] > Lon_left and \
            echo_temp['Facility Longitude'][ifacility] < Lon_right and \
            echo_temp['Facility Latitude'][ifacility] > Lat_low and \
            echo_temp['Facility Latitude'][ifacility] < Lat_up:

            ilat = int((echo_temp['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_temp['Facility Longitude'][ifacility] - Lon_left)/Res01)
            map_ind_eth_emis[ilat,ilon,iyear] += echo_temp['emis_kt'][ifacility]
        else:
            map_ind_eth_emis_nongrid[iyear] += echo_temp['emis_kt'][ifacility]
    print('Year',year_range[iyear],'Emissions (kt):',np.sum(map_ind_eth_emis[:,:,iyear])+map_ind_eth_emis_nongrid[iyear])
    

##### Step 2.3.3.6. Breweries

In [None]:
#) Step 1 -  try to match facilities to ghgrp based on location 
# If there is no match with a GHGRP facility, add it to the full list of facilities
echo_brew['ghgrp_match'] = 0
echo_brew['emis_kt'] = 0
echo_brew['EF'] = 0
ghgrp_brew.loc[:,'found']=0

for ifacility in np.arange(0,len(echo_brew)):
    for ighgrp in np.arange(0,len(ghgrp_brew)):
        dist = np.sqrt((ghgrp_brew.loc[ighgrp,'LATITUDE']-echo_brew.loc[ifacility,'Facility Latitude'])**2\
                       +(ghgrp_brew.loc[ighgrp,'LONGITUDE']-echo_brew.loc[ifacility,'Facility Longitude'])**2)
        if dist < 0.025 and ghgrp_brew.loc[ighgrp,'Year'] == echo_brew.loc[ifacility,'Year']:
            ghgrp_brew.loc[ighgrp,'found'] = 1
            echo_brew.loc[ifacility,'ghgrp_match'] = 1
            echo_brew.loc[ifacility,'emis_kt'] = ghgrp_brew.loc[ighgrp,'emis_kt_tot']
        else:
            continue
print('Found (%):',100*np.sum(echo_brew['ghgrp_match']/len(echo_brew)))
print('GHGRP not found:',len(ghgrp_brew[ghgrp_brew['found']==0]))
print('GHGRP found:',len(ghgrp_brew[ghgrp_brew['found']==1]))
print('Total Emis (kt):',np.sum(echo_brew['emis_kt']))


# Step 2 - add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_brew)):
    if ghgrp_brew.loc[ifacility,'found'] ==0:
        facility_id = ghgrp_brew.loc[ifacility,'Facility_ID']
        df2 = {'Year':ghgrp_brew.loc[ifacility,'Year'],'State':ghgrp_brew.loc[ifacility,'STATE'], 'City':ghgrp_brew.loc[ifacility,'CITY'],\
               'County':ghgrp_brew.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Facility Latitude': ghgrp_brew.loc[ifacility,'LATITUDE'], \
               'Facility Longitude':ghgrp_brew.loc[ifacility,'LONGITUDE'], 'EF':0,\
               'calc_flow_mgd':0,'emis_kt':ghgrp_brew.loc[ifacility,'emis_kt_tot']}
        echo_brew = echo_brew.append(df2, ignore_index = True)
        ghgrp_brew.loc[ifacility,'found'] =1

        
#Step 3 -  distribute remaining emissions difference based on the relative wastewater flow at each facility
for iyear in np.arange(0, num_years):
    epa_emi  = EPA_emi_ind_ww_CH4.loc[EPA_emi_ind_ww_CH4['Source'] == 'Breweries',year_range[iyear]].values[0]
    sum_emi = np.sum(echo_brew.loc[echo_brew['Year']==year_range[iyear],'emis_kt'])
    emi_diff = max((epa_emi-sum_emi),0)
    print(emi_diff)
    total_flow = np.sum(echo_brew.loc[(echo_brew['Year']==year_range[iyear]) & (echo_brew['ghgrp_match']==0),'calc_flow_mgd'])
    for ifacility in np.arange(0, len(echo_brew)):
        if echo_brew.loc[ifacility,'Year']==year_range[iyear] and echo_brew.loc[ifacility,'ghgrp_match']==0:
            echo_brew.loc[ifacility,'emis_kt'] = emi_diff * echo_brew.loc[ifacility,'calc_flow_mgd']/total_flow
    

# Step 4- Place data on CONUS grid
map_ind_brew_emis = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_ind_brew_emis_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_temp = echo_brew[echo_brew['Year'] == year_range[iyear]]
    echo_temp.reset_index(inplace=True, drop=True)
    for ifacility in np.arange(0, len(echo_temp)):
        if echo_temp['Facility Longitude'][ifacility] > Lon_left and \
            echo_temp['Facility Longitude'][ifacility] < Lon_right and \
            echo_temp['Facility Latitude'][ifacility] > Lat_low and \
            echo_temp['Facility Latitude'][ifacility] < Lat_up:

            ilat = int((echo_temp['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_temp['Facility Longitude'][ifacility] - Lon_left)/Res01)
            map_ind_brew_emis[ilat,ilon,iyear] += echo_temp['emis_kt'][ifacility]
        else:
            map_ind_brew_emis_nongrid[iyear] += echo_temp['emis_kt'][ifacility]
    print('Year',year_range[iyear],'Emissions (kt):',np.sum(map_ind_brew_emis[:,:,iyear])+map_ind_brew_emis_nongrid[iyear])

##### Step 2.3.3.7. Petroleum Refining

In [None]:
#For Petroleeum Refining - GHGRP emissions are from Subpart Y. There are no 
#) Step 1 -  try to match facilities to ghgrp based on location 
# If there is no match with a GHGRP facility, add it to the full list of facilities
echo_ref['ghgrp_match'] = 0
echo_ref['emis_kt'] = 0
echo_ref['EF'] = 0
ghgrp_ref.loc[:,'found']=0

for ifacility in np.arange(0,len(echo_ref)):
    for ighgrp in np.arange(0,len(ghgrp_ref)):
        dist = np.sqrt((ghgrp_ref.loc[ighgrp,'LATITUDE']-echo_ref.loc[ifacility,'Facility Latitude'])**2\
                       +(ghgrp_ref.loc[ighgrp,'LONGITUDE']-echo_ref.loc[ifacility,'Facility Longitude'])**2)
        if dist < 0.025 and ghgrp_ref.loc[ighgrp,'Year'] == echo_ref.loc[ifacility,'Year']:
            ghgrp_ref.loc[ighgrp,'found'] = 1
            echo_ref.loc[ifacility,'ghgrp_match'] = 1
            echo_ref.loc[ifacility,'emis_kt'] = ghgrp_ref.loc[ighgrp,'emis_kt_tot']
        else:
            continue
print('Found (%):',100*np.sum(echo_ref['ghgrp_match']/len(echo_ref)))
print('GHGRP not found:',len(ghgrp_ref[ghgrp_ref['found']==0]))
print('GHGRP found:',len(ghgrp_ref[ghgrp_ref['found']==1]))
print('Total Emis (kt):',np.sum(echo_ref['emis_kt']))


# Step 2 - add additional GHGRP facilities (to capture all GHGRP emissions - deviates from GHGI methods)
# for each extra facility in the ghgrp dataset, append an extra row, then fill in the emissions values for each year
for ifacility in np.arange(0,len(ghgrp_ref)):
    if ghgrp_ref.loc[ifacility,'found'] ==0:
        facility_id = ghgrp_ref.loc[ifacility,'Facility_ID']
        df2 = {'Year':ghgrp_ref.loc[ifacility,'Year'],'State':ghgrp_ref.loc[ifacility,'STATE'], 'City':ghgrp_ref.loc[ifacility,'CITY'],\
               'County':ghgrp_ref.loc[ifacility,'COUNTY'],'ghgrp_match': 1, 'Facility Latitude': ghgrp_ref.loc[ifacility,'LATITUDE'], \
               'Facility Longitude':ghgrp_ref.loc[ifacility,'LONGITUDE'], 'EF':0,\
               'calc_flow_mgd':0,'emis_kt':ghgrp_ref.loc[ifacility,'emis_kt_tot']}
        echo_ref = echo_ref.append(df2, ignore_index = True)
        ghgrp_ref.loc[ifacility,'found'] =1

        
#Step 3 -  distribute remaining emissions difference based on the relative wastewater flow at each facility
for iyear in np.arange(0, num_years):
    epa_emi  = EPA_emi_ind_ww_CH4.loc[EPA_emi_ind_ww_CH4['Source'] == 'Petroleum Refining',year_range[iyear]].values[0]
    sum_emi = np.sum(echo_ref.loc[echo_ref['Year']==year_range[iyear],'emis_kt'])
    emi_diff = max((epa_emi-sum_emi),0)
    print(emi_diff)
    total_flow = np.sum(echo_ref.loc[(echo_ref['Year']==year_range[iyear]) & (echo_ref['ghgrp_match']==0),'calc_flow_mgd'])
    for ifacility in np.arange(0, len(echo_ref)):
        if echo_ref.loc[ifacility,'Year']==year_range[iyear] and echo_ref.loc[ifacility,'ghgrp_match']==0:
            echo_ref.loc[ifacility,'emis_kt'] = emi_diff * echo_ref.loc[ifacility,'calc_flow_mgd']/total_flow
    

# Step 4- Place data on CONUS grid
map_ind_ref_emis = np.zeros([len(Lat_01),len(Lon_01),num_years])
map_ind_ref_emis_nongrid = np.zeros([num_years])

for iyear in np.arange(0, num_years):
    echo_temp = echo_ref[echo_ref['Year'] == year_range[iyear]]
    echo_temp.reset_index(inplace=True, drop=True)
    for ifacility in np.arange(0, len(echo_temp)):
        if echo_temp['Facility Longitude'][ifacility] > Lon_left and \
            echo_temp['Facility Longitude'][ifacility] < Lon_right and \
            echo_temp['Facility Latitude'][ifacility] > Lat_low and \
            echo_temp['Facility Latitude'][ifacility] < Lat_up:

            ilat = int((echo_temp['Facility Latitude'][ifacility] - Lat_low)/Res01)
            ilon = int((echo_temp['Facility Longitude'][ifacility] - Lon_left)/Res01)
            map_ind_ref_emis[ilat,ilon,iyear] += echo_temp['emis_kt'][ifacility]
        else:
            map_ind_ref_emis_nongrid[iyear] += echo_temp['emis_kt'][ifacility]
    print('Year',year_range[iyear],'Emissions (kt):',np.sum(map_ind_ref_emis[:,:,iyear])+map_ind_ref_emis_nongrid[iyear])

-----------
## Step 3. Read in and Format US EPA GHGI Emissions
----------

In [None]:
#Read in the emissions data from the GHGI workbook (in kt)

names = pd.read_excel(EPA_ww_inputfile, sheet_name = "Dom Calcs", usecols = "B:AE",skiprows = 14, header = 0)
colnames = names.columns.values
EPA_emi_dom_ww_CH4 = pd.read_excel(EPA_ww_inputfile, sheet_name = "Dom Calcs", usecols = "B:AE", skiprows = 14, names = colnames)
#drop rows with no data, remove the parentheses and ""
EPA_emi_dom_ww_CH4 = EPA_emi_dom_ww_CH4.fillna('')
EPA_emi_dom_ww_CH4.rename(columns={EPA_emi_dom_ww_CH4.columns[0]:'Source'}, inplace=True)
EPA_emi_dom_ww_CH4['Source']= EPA_emi_dom_ww_CH4['Source'].str.replace(r"\(","")
EPA_emi_dom_ww_CH4['Source']= EPA_emi_dom_ww_CH4['Source'].str.replace(r"\)","")
EPA_emi_dom_ww_CH4 = EPA_emi_dom_ww_CH4.drop(columns = [n for n in range(1990, start_year,1)])

names = pd.read_excel(EPA_ww_inputfile, sheet_name = "InvDB", usecols = "E:AJ",skiprows = 15, header = 0)
colnames = names.columns.values
EPA_emi_ind_ww_CH4 = pd.read_excel(EPA_ww_inputfile, sheet_name = "InvDB", usecols = "E:AJ", skiprows = 15, names = colnames)
#drop rows with no data, remove the parentheses and ""
EPA_emi_ind_ww_CH4 = EPA_emi_ind_ww_CH4.fillna('')
EPA_emi_ind_ww_CH4.rename(columns={EPA_emi_ind_ww_CH4.columns[0]:'Source'}, inplace=True)
EPA_emi_ind_ww_CH4['Source']= EPA_emi_ind_ww_CH4['Source'].str.replace(r"\(","")
EPA_emi_ind_ww_CH4['Source']= EPA_emi_ind_ww_CH4['Source'].str.replace(r"\)","")
EPA_emi_ind_ww_CH4.drop(EPA_emi_ind_ww_CH4.columns[[1,2]], axis =1, inplace= True)
EPA_emi_ind_ww_CH4 = EPA_emi_ind_ww_CH4.drop(columns = [n for n in range(1990, start_year,1)])
EPA_emi_ind_ww_CH4.iloc[:,1:] = EPA_emi_ind_ww_CH4.iloc[:,1:]*1000 #convert Tg to kt
EPA_emi_ww_CH4 = EPA_emi_dom_ww_CH4.append(EPA_emi_ind_ww_CH4)
EPA_emi_ww_CH4.reset_index(inplace=True, drop=True)

#calculate national total values
temp = EPA_emi_ind_ww_CH4.sum(axis=0)
EPA_emi_ind_ww_CH4 = EPA_emi_ind_ww_CH4.append(temp, ignore_index=True)
EPA_emi_ind_ww_CH4.iloc[-1,0] = 'Total'
EPA_emi_ww_total = EPA_emi_ind_ww_CH4[EPA_emi_ind_ww_CH4['Source'] == 'Total']
display(EPA_emi_ww_total)

#### 3.2. Split Emissions into Gridding Groups

In [None]:
#split GHG emissions into gridding groups, based on Coal Proxy Mapping file

DEBUG =1
start_year_idx = EPA_emi_ww_CH4.columns.get_loc((start_year))
end_year_idx = EPA_emi_ww_CH4.columns.get_loc((end_year))+1
ghgi_wwt_groups = ghgi_wwt_map['GHGI_Emi_Group'].unique()
sum_emi = np.zeros([num_years])

for igroup in np.arange(0,len(ghgi_wwt_groups)): #loop through all groups, finding the GHGI sources in that group and summing emissions for that region, year        vars()[ghgi_prod_groups[igroup]] = np.zeros([num_regions-1,num_years])
    ##DEBUG## print(ghgi_stat_groups[igroup])
    vars()[ghgi_wwt_groups[igroup]] = np.zeros([num_years])
    source_temp = ghgi_wwt_map.loc[ghgi_wwt_map['GHGI_Emi_Group'] == ghgi_wwt_groups[igroup], 'GHGI_Source']
    pattern_temp  = '|'.join(source_temp) 
    emi_temp =EPA_emi_ww_CH4[EPA_emi_ww_CH4['Source'].str.contains(pattern_temp)]
    vars()[ghgi_wwt_groups[igroup]][:] = emi_temp.iloc[:,start_year_idx:].sum()
        
        
#Check against total summary emissions 
print('QA/QC #1: Check Processing Emission Sum against GHGI Summary Emissions')
for iyear in np.arange(0,num_years): 
    for igroup in np.arange(0,len(ghgi_wwt_groups)):
        sum_emi[iyear] += vars()[ghgi_wwt_groups[igroup]][iyear]
        
    summary_emi = EPA_emi_ww_total.iloc[0,iyear+1]  
    #Check 1 - make sure that the sums from all the regions equal the totals reported
    diff1 = abs(sum_emi[iyear] - summary_emi)/((sum_emi[iyear] + summary_emi)/2)
    if DEBUG==1:
        print(summary_emi)
        print(sum_emi[iyear])
    if diff1 < 0.0001:
        print('Year ', year_range[iyear],': PASS, difference < 0.01%')
    else:
        print('Year ', year_range[iyear],': FAIL (check Production & summary tabs): ', diff1,'%') 

--------------
## Step 4. Grid Data
-------------

#### Step 4.1. Allocate emissions

##### Step 4.1.1 Assign the Appropriate Proxy Variable Names (state & grid)

In [None]:
# The names on the *left* need to match the 'ProxyMapping' 'State_Proxy_Group' names 
# (these are initialized in Step 2). 
# The names on the *right* are the variable names used to caluclate the proxies in this code.
# Names on the right need to match those from the code in Step 2

#national --> grid (0.01) proxies (lat x lon x year)
Map_pop = map_pop
Map_Dom_Aero = map_dom_cw
Map_Dom_Anaero = map_dom_anaero
Map_Dom_AD = map_dom_ad
Map_Dom_WWF = map_dom_wwf
Map_PP = map_ind_pp_emis
Map_MP =map_ind_mp_emis
Map_FV = map_ind_fv_emis
Map_Brew = map_ind_brew_emis
Map_Ethanol = map_ind_eth_emis
Map_PetrRef = map_ind_ref_emis

Map_pop_nongrid = np.zeros([num_years])
Map_Dom_Aero_nongrid = map_dom_cw_nongrid
Map_Dom_Anaero_nongrid = map_dom_anaero_nongrid
Map_Dom_AD_nongrid = map_dom_ad_nongrid
Map_Dom_WWF_nongrid = map_dom_wwf_nongrid
Map_PP_nongrid= map_ind_pp_emis_nongrid
Map_MP_nongrid= map_ind_mp_emis_nongrid
Map_FV_nongrid = map_ind_fv_emis_nongrid
Map_Brew_nongrid = map_ind_brew_emis_nongrid
Map_Ethanol_nongrid= map_ind_eth_emis_nongrid
Map_PetrRef_nongrid= map_ind_ref_emis_nongrid 


#### Step 4.1.2 Allocate emissions to the CONUS region (0.1x0.1)

In [None]:
# Allocate national emissions (kt) onto a 0.1x0.1 grid using gridcell level 'Proxy_Groups'

DEBUG =1
#Define emission arrays
Emissions_array_01 = np.zeros([len(Lat_01),len(Lon_01),num_years])

Emissions_array_01_dom = np.zeros([len(Lat_01),len(Lon_01),num_years])

Emissions_array_01_ind = np.zeros([len(Lat_01),len(Lon_01),num_years])

Emissions_nongrid = np.zeros([num_years])

# For each year, (2a) [NOT RELEVANT TO THIS SOURCE] distribute state-level emissions onto a grid using proxies defined above ....
# To speed up the code, masks are used rather than looping individually through each lat/lon. 
# In this case, a mask of 1's is made for the grid cells that match the ANSI values for a given state
# The masked values are set to zero, remaining values = 1. 
# AK and HI and territories are removed from the analysis at this stage. 
# The emissions allocated to each state are at 0.01x0.01 degree resolution, as required to calculate accurate 'mask'
# arrays for each state. 
# (2b) For emission groups that were not first allocated to states [RELEVENT HERE], national emissions for those groups are gridded
# based on the relevant gridded proxy arrays (0.1x0.1 resolution). These emissions are at 0.1x0.1 degrees resolution. 
# (2c) - record 'not mapped' emission groups in the 'non-grid' array (not relevant here)

print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
running_sum = np.zeros([len(proxy_wwt_map),num_years])

for igroup in np.arange(0,len(proxy_wwt_map)):
    proxy_temp = vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']]
    proxy_temp_nongrid = vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']+'_nongrid']
    #vars()['Ext_'+proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']+'_01'] = np.zeros([len(lat001),len(lon001),num_years])
    vars()['Ext_'+proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']] = np.zeros([len(Lat_01),len(Lon_01),num_years])
 
         
    #2b. if emissions were not allocated to state, allocate national total to grid here (these are in 0.1x0.1 resolution)
    if proxy_wwt_map.loc[igroup,'Proxy_Group'] != 'Map_not_mapped':
        for iyear in np.arange(0,num_years):
            temp_sum = np.sum(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']][:,:,iyear])+np.sum(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear])
            emi_temp = vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                   data_fn.safe_div(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
            Emissions_array_01[:,:,iyear] += emi_temp
            vars()['Ext_'+proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear] += emi_temp
            if '_Dom_' in proxy_wwt_map.loc[igroup, 'GHGI_Emi_Group']:
                Emissions_array_01_dom[:,:,iyear] += vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                    data_fn.safe_div(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
            elif 'Ind' in proxy_wwt_map.loc[igroup, 'GHGI_Emi_Group']:
                Emissions_array_01_ind[:,:,iyear] += vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                    data_fn.safe_div(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)
            
            Emissions_nongrid[iyear] += vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear] *\
                    data_fn.safe_div(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear], temp_sum)
            ##DEBUG## running_count += vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear]
            running_sum[igroup,iyear] += np.sum(vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear] * \
                   data_fn.safe_div(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']][:,:,iyear], temp_sum)) + \
                    (vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear] *\
                     data_fn.safe_div(vars()[proxy_wwt_map.loc[igroup,'Proxy_Group']+'_nongrid'][iyear], temp_sum))    

    #2c. this is the case that GHGI emissions are not mapped (e.g., specified outside of CONUS in the GHGI)
    elif proxy_wwt_map.loc[igroup,'Proxy_Group'] == 'Map_not_mapped':  
        for iyear in np.arange(0, num_years):
            Emissions_nongrid[iyear] += vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear]
            running_sum[igroup,iyear] += vars()[proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][iyear] 
    #print(running_sum[igroup,iyear])

for iyear in np.arange(0, num_years):    
    calc_emi = np.sum(Emissions_array_01[:,:,iyear]) + np.sum(Emissions_nongrid[iyear]) 
    calc_emi2 = np.sum(Emissions_array_01_dom[:,:,iyear]) +np.sum(Emissions_array_01_ind[:,:,iyear])+ \
                np.sum(Emissions_nongrid[iyear]) 
    calc_emi3 = 0
    for igroup in np.arange(0,len(proxy_wwt_map)):
        calc_emi3 += np.sum(vars()['Ext_'+proxy_wwt_map.loc[igroup,'GHGI_Emi_Group']][:,:,iyear])
    calc_emi3 += np.sum(Emissions_nongrid[iyear]) 
    summary_emi = EPA_emi_ww_total.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(calc_emi2)
        print(calc_emi3)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
ct = datetime.now() 
print("current time:", ct)

#### Step 4.1.4 Save gridded emissions (kt)

In [None]:
#save gridded emissions for each gridding group - for extension

#Initialize file
data_IO_fn.initialize_netCDF(grid_emi_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)

unique_groups = np.unique(proxy_wwt_map['GHGI_Emi_Group'])
unique_groups = unique_groups[unique_groups != 'Emi_not_mapped']

nc_out = Dataset(grid_emi_outputfile, 'r+', format='NETCDF4')

for igroup in np.arange(0,len(unique_groups)):
    print('Ext_'+unique_groups[igroup])
    if len(np.shape(vars()['Ext_'+unique_groups[igroup]])) ==4:
        ghgi_temp = np.sum(vars()[unique_groups[igroup]],axis=3) #sum month data if data is monthly
    else:
        ghgi_temp = vars()['Ext_'+unique_groups[igroup]]

    # Write data to netCDF
    data_out = nc_out.createVariable('Ext_'+unique_groups[igroup], 'f8', ('lat', 'lon','year'), zlib=True)
    data_out[:,:,:] = ghgi_temp[:,:,:]

#save nongrid data to calculate non-grid fraction extension
data_out = nc_out.createVariable('Emissions_nongrid', 'f8', ('year'), zlib=True)  
data_out[:] = Emissions_nongrid[:]
nc_out.close()

#Confirm file location
print('** SUCCESS **')
print("Gridded emissions (kt) written to file: {}" .format(os.getcwd())+grid_emi_outputfile)
print(' ')

del data_out, ghgi_temp, nc_out

#### 4.2 Calculate Gridded Emission Fluxes (molec./cm2/s) (0.1x0.1)

In [None]:
#Convert emissions to emission flux
# convert kt to molec/cm2/s

Flux_array_01_annual = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_array_01_annual_dom = np.zeros([len(Lat_01),len(Lon_01),num_years])
Flux_array_01_annual_ind = np.zeros([len(Lat_01),len(Lon_01),num_years])
print('**QA/QC Check: Sum of national gridded emissions vs. GHGI national emissions')
  
for iyear in np.arange(0,num_years):
    calc_emi = 0
    if year_range[iyear]==2012 or year_range[iyear]==2016:
        year_days = np.sum(month_day_leap)
        #month_days = month_day_leap
    else:
        year_days = np.sum(month_day_nonleap)
        #month_days = month_day_nonleap
        
    #for imonth in np.arange(0,num_months):
    conversion_factor_01 = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    Flux_array_01_annual[:,:,iyear] = Emissions_array_01[:,:,iyear]*conversion_factor_01
    Flux_array_01_annual_dom[:,:,iyear] = Emissions_array_01_dom[:,:,iyear]*conversion_factor_01
    Flux_array_01_annual_ind[:,:,iyear] = Emissions_array_01_ind[:,:,iyear]*conversion_factor_01
    #convert back to mass to check
    conversion_factor_annual = 10**9 * Avogadro / float(Molarch4 *year_days * 24 * 60 *60) / area_matrix_01
    calc_emi = np.sum(Flux_array_01_annual[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_nongrid[iyear])
    calc_emi2 = np.sum(Flux_array_01_annual_dom[:,:,iyear]/conversion_factor_annual)+\
                np.sum(Flux_array_01_annual_ind[:,:,iyear]/conversion_factor_annual)+np.sum(Emissions_nongrid[iyear])
    summary_emi = EPA_emi_ww_total.iloc[0,iyear+1] 
    emi_diff = abs(summary_emi-calc_emi)/((summary_emi+calc_emi)/2)
    if DEBUG==1:
        print(calc_emi)
        print(calc_emi2)
        print(summary_emi)
    if abs(emi_diff) < 0.0001:
        print('Year '+ year_range_str[iyear]+': Difference < 0.01%: PASS')
    else: 
        print('Year '+ year_range_str[iyear]+': Difference > 0.01%: FAIL, diff: '+str(emi_diff))
        
Flux_Emissions_Total_annual = Flux_array_01_annual
Flux_Emissions_Total_annual_dom = Flux_array_01_annual_dom
Flux_Emissions_Total_annual_ind = Flux_array_01_annual_ind

-------------
## Step 5. Write netCDF
------------

In [None]:
# yearly data

#Total
#Initialize file
data_IO_fn.initialize_netCDF(gridded_outputfile, netCDF_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
# Write data to netCDF
nc_out = Dataset(gridded_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_outputfile)

#MSW Landfills
# yearly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_dom_outputfile, netCDF_dom_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
# Write data to netCDF
nc_out = Dataset(gridded_dom_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual_dom
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_dom_outputfile)

#Industrial Landfills
# yearly data
#Initialize file
data_IO_fn.initialize_netCDF(gridded_ind_outputfile, netCDF_ind_description, 0, year_range, loc_dimensions, Lat_01, Lon_01)
# Write data to netCDF
nc_out = Dataset(gridded_ind_outputfile, 'r+', format='NETCDF4')
nc_out.variables['emi_ch4'][:,:,:] = Flux_Emissions_Total_annual_ind
nc_out.close()
#Confirm file location
print('** SUCCESS **')
print("Gridded stationary combustion fluxes written to file: {}" .format(os.getcwd())+gridded_ind_outputfile)

----------
## Step 6. Plot Gridded Data
---------

#### Step 6.1. Plot Annual Emission Fluxes

In [None]:
#Plot Annual Data
#Total
scale_max = 10
save_flag = 0
save_fig = ''
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_str,scale_max,save_flag,save_fig)

# Dom
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual_dom, Lat_01, Lon_01, year_range, title_str_dom,scale_max,save_flag,save_fig)

#IND
data_plot_fn.plot_annual_emission_flux_map(Flux_Emissions_Total_annual_ind, Lat_01, Lon_01, year_range, title_str_ind,scale_max,save_flag,save_fig)


#### Step 6.2 Plot Difference between first and last inventory year

In [None]:
# Plot difference between last and first year
save_flag =0
save_fig = ''
#Total
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual, Lat_01, Lon_01, year_range, title_diff_str,save_flag,save_fig)

#MSW
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual_dom, Lat_01, Lon_01, year_range, title_diff_str_dom,save_flag,save_fig)

#IND
data_plot_fn.plot_diff_emission_flux_map(Flux_Emissions_Total_annual_ind, Lat_01, Lon_01, year_range, title_diff_str_ind,save_flag,save_fig)

In [None]:
ct = datetime.now() 
ft = ct.timestamp() 
time_elapsed = (ft-it)/(60*60)
print('Time to run: '+str(time_elapsed)+' hours')
print('** GEPA_5D_Wastewater: COMPLETE **')